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Abstract.  We  present  a  novel  framework,  namely  AADMM,  for  acceleration  of  linearized  alternating  direction  method  of 
multipliers  (ADMM).  The  basic  idea  of  AADMM  is  to  incorporate  a  multi-step  acceleration  scheme  into  linearized  ADMM.  We 
demonstrate  that  for  solving  a  class  of  convex  composite  optimization  with  linear  constraints,  the  rate  of  convergence  of  AADMM 
is  better  than  that  of  linearized  ADMM,  in  terms  of  their  dependence  on  the  Lipschitz  constant  of  the  smooth  component. 
Moreover,  AADMM  is  capable  to  deal  with  the  situation  when  the  feasible  region  is  unbounded,  as  long  as  the  corresponding 
saddle  point  problem  has  a  solution.  A  backtracking  algorithm  is  also  proposed  for  practical  performance. 

1.  Introduction.  Assume  that  W,  X  and  y  are  finite  dimensional  vectorial  spaces  equipped  with  inner 
product  (•,•},  norm  ||  •  ||  and  conjugate  norm  ||  •  ||*.  Our  problem  of  interest  is  the  following  affine  equality 
constrained  composite  optimization  (AECC0)  problem: 

min  G(x)  +  F(w),  s.  t.  Bw  —  Kx  =  b,  (1.1) 

x£X,w£W 

where  X  C  X  is  a  closed  convex  set,  G(-)  :  X  — >  R  and  F(-)  :  W  — >  R  are  finitely  valued,  convex  and  lower 
semi-continuous  functions,  and  K  :  X  -A  y,  B  :  W  -A  y  are  bounded  linear  operators. 

In  this  paper,  we  assume  that  F(-)  is  simple,  in  the  sense  that  the  optimization  problem 

min  ^llu;  —  c||2  +  F(w),  where  c  £  W,  rj  €  R  (1.2) 

w€W  2 

can  be  solved  efficiently.  We  will  use  the  term  “simple”  in  this  sense  throughout  this  paper,  and  use  the  term 
“non-simple”  in  the  opposite  sense.  We  assume  that  G(-)  is  non-simple,  continuously  differentiable,  and  that 
there  exists  Lq  >  0  such  that 

G(x2)  -  G(xi)  -  (VG(xi), x2  -  xi)  <  — — —  ||x2  -  xi||2,  Vxi  €  X,x2  €  X.  (1.3) 

One  special  case  of  the  AECCO  problem  in  (1.1)  is  when  B  =  I  and  6  =  0.  Under  this  situation,  problem 
(1.1)  is  equivalent  to  the  following  unconstrained  composite  optimization  (UCO)  problem: 

min  f(x):=G(x)  +  F(Kx).  (1.4) 

x£X 

Both  AECCO  and  UCO  can  be  reformulated  as  saddle  point  problems.  By  the  method  of  Lagrangian 
multipliers,  the  AECCO  problem  (1.1)  is  equivalent  to  the  following  saddle  point  problem: 

min  maxG(x)  +  F(w)  —  (y,  Bw  —  Kx  —  b).  (1.5) 

x€X,w£W  y&y 

The  AECCO  and  UCO  problems  have  found  numerous  applications  in  machine  learning  and  image  processing. 
In  most  application,  G(-)  is  known  as  the  fidelity  term  and  F(-)  is  the  regularization  term.  For  example, 
consider  the  following  two  dimensional  total  variation  (TV)  based  image  reconstruction  problem 

min  i||Ax-c||2  +  A||Dx||2,i,  (1.6) 
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where  the  field  F  is  either  R  or  C,  x  is  the  n- vector  form  of  a  two-dimensional  complex  or  real  valued  image, 
D  :  F"  — >•  F2n  is  the  two-dimensional  finite  difference  operator  acting  on  the  image  x,  and 

n 

IMki  :=ElKy(2l_1)^(2i))TH2’  Vy  €  F2n, 

i= 1 

where  ||  •  || 2  is  the  Euclidean  norm  in  R2.  In  (1.6),  the  regularization  term  ||_Dx||2ii  is  the  discrete  form  of  TV 
semi-norm.  By  setting  G{x)  :=  \\Ax  —  c||2/2,  F(-)  :=  ||  •  ||2,i,  K  =  XD ,  X  =  X  =  F"  and  W  =  F2”,  problem 
(1.6)  becomes  a  UCO  problem  in  (1.4). 

1.1.  Notations  and  terminologies.  In  this  subsection,  we  describe  some  necessary  assumptions,  no¬ 
tations  and  terminologies  that  will  be  used  throughout  this  paper. 

We  assume  that  there  exists  an  optimal  solution  (w*,x*)  of  (1.1)  and  that  there  exists  y*  £  y  such 
that  z*  :=  ( w*,x*,y *)  £  Z  is  a  saddle  point  of  (1.5),  where  Z  :=  W  x  X  x  y.  We  also  use  the  notation 
Z  :=  W  x  X  x  Y  if  a  set  Y  C  y  is  declared  readily.  We  use  f*  :=  G{x*)  +  F(w*)  to  denote  the  optimal 
objective  value  of  problem  (1.1).  Since  UCO  problems  (1.4)  are  special  cases  of  AECCO  (1.1),  we  will  also 
use  f*  to  denote  the  optimal  value  G(x*)  +  F(Kx*). 

In  view  of  (1.1),  both  the  objective  function  value  and  the  feasibility  of  the  constraint  should  be  considered 
when  defining  approximate  solutions  of  AECCO,  henceforth  the  following  definition  comes  naturally: 
Definition  1.1.  A  pair  (w,x)  £  W  x  X  is  called  an  (e,  S) -solution  of  (1.1)  if 

G(x)  +  F(w)  —  f*  <  e,  and  \\Bw  —  Kx  —  &||  <  S. 

We  say  that  (w,  x)  has  primal  residual  e  and  feasibility  residual  S.  In  particular,  if  (w,  x)  is  an  (e,  0 )-solution, 
then  we  simply  say  that  it  is  an  e-solution. 

The  feasibility  residual  S  in  Definition  1.1  measures  the  violation  of  the  equality  constraint,  and  the 
primal  residual  e  measures  the  gap  between  the  objective  value  G(x )  +  F(w)  at  the  approximate  solution 
and  the  optimal  value  /*.  For  an  (e,  (S)-solution  (w,x)  where  S  >  0,  since  (w,x)  does  not  satisfy  the  equality 
constraint  in  (1.1),  it  is  possible  that  G{x)  +  F(w)  —  f*  <  0.  However,  as  pointed  out  in  [31],  a  lower  bound 
of  G(x)  +  F(w )  —  f*  is  given  by 

G(x)  +F(w)  -  f*  >  (y*,Bw  —  Kx  —  b)  >  -<5||y*||, 

where  y*  is  a  component  of  z*  =  (w*,x*,y*),  a  saddle  point  of  (1.5). 

In  the  remainder  of  this  subsection,  we  introduce  some  notations  that  will  be  used  throughout  this  paper. 
The  following  distance  constants  will  be  used  for  simplicity: 


Dw*,b  ■=  \\B(wi  -  w*)\\,Dx.tK  :=  \\K(xi  -  x*)\\,Dx,  :=  ||xi  -  x*\\,Dy*  :=  ||yi  -  y* ||, 

Dx,k  '■=  sup  \\Kxi  —  Kx2 II,  and  Ds  :=  sup  ||si  —  s2 1| ,  for  any  compact  set  S. 

Xi,X2  GX  Si,S2^5 

For  example,  for  any  compact  set  Y  C  we  use  Dy  to  denote  the  diameter  of  Y .  In  addition,  we  use 
X[t]  to  denote  sequence  {xi}\—i,  where  ay’s  may  either  be  real  numbers,  or  points  in  vectorial  spaces.  We 
will  also  equip  a  few  operations  on  the  notation  of  sequences.  Firstly,  suppose  that  Vi,  V2  are  any  vector 
spaces,  V[t+i]  C  Vi  is  any  sequence  in  Vi  and  A  :  Vi  — ►  V2  is  any  operator,  we  use  Av\t+i\  to  denote  the 
sequence  {Avi}*!. Secondly,  if  r/[t],r[t]  C  R  are  any  real  valued  sequences,  and  I  £  I  is  any  real  number, 
then  77[t]  —  Lt^  denotes  {rji  —  Lri}\_v  Finally,  we  denote  by  the  reciprocal  sequence  {r/,-  }[=]  for  any 
non-zero  real  valued  sequence  77^] . 

1.2.  Augmented  Lagrangian  and  alternating  direction  method  of  multipliers.  In  this  paper, 
we  study  AECCO  problems  from  the  aspect  of  the  augmented  Lagrangian  formulation  of  (1.5): 

min  m&xG(x)  +  F(w)  —  (y,  Bw  —  Kx  —  b)  +  ^\\Bw  —  Kx  —  b\\2 ,  (1.8) 

xGX,wGW  y&y  2 
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where  p  is  a  penalty  parameter.  The  idea  of  analyzing  (1.8)  in  order  to  solve  (1.1)  is  essentially  the  augmented 
Lagrangian  method  (ALM)  by  Hestenes  [26]  and  Powell  [44]  (It  is  originally  called  the  method  of  multipliers 
in  [26,  44];  see  also  the  textbooks,  e.g.,  [5,  41,  6]).  The  ALM  is  a  special  case  of  the  Douglas-Rachford  splitting 
method  [19,  16,  32],  which  is  also  an  instance  of  the  proximal  point  algorithm  [17,  46].  The  iteration  complexity 
of  an  inexact  version  of  ALM,  where  the  subproblems  are  solved  iteratively  by  Nesterov’s  method,  has  been 
studied  in  [30].  One  influential  variant  of  ALM  is  the  ADMM  algorithm  [20,  21],  which  is  an  alternating 
method  for  solving  (1.8)  by  minimizing  x  and  w  alternatively  and  then  updating  the  Lagrangian  coefficient 
y  (See  [7]  for  a  comprehensive  explanation  on  ALM,  ADMM  and  other  algorithms).  In  compressive  sensing 
and  imaging  science,  the  class  of  Bregman  iterative  methods  is  an  application  of  the  ALM  and  the  ADMM. 
In  particular,  the  Bregman  iterative  method  [24]  is  equivalent  to  ALM,  and  the  split  Bregman  method  [23]  is 
equivalent  to  ADMM. 

We  give  a  brief  review  on  ADMM,  and  some  of  its  variants.  The  scheme  of  ADMM  is  described  in 
Algorithm  1. 


Algorithm  1 

The  alternating  direction  method  of  multipliers  (ADMM)  for  solving  (1.1) 

Choose  x\ 

€  X,  wi  G  W  and  y\  £  y. 

for  t  =  1, . 

..,7V-  1  do 

xt+i  =  argmin  G(x)  -  ( yt ,  Bwt  Kx  -  b)  +  ^|| Bwt  -  Kx  -  b\\ 2, 

(1.9) 

Wt+ 1  =  argmin F(w)  -  ( yt,Bw  -  Kxt+\  -  b)  +  ^|| Bw  -  Kxt+ 1  -  6||2, 
we\v  ^ 

(1.10) 

yt+ 1  =  yt  -  p{Bwt+ 1  -  Kxt+1  -  b). 

(1.11) 

end  for 

For  non-simple  G,  a  linearized  ADMM  (L-ADMM)  scheme  generates  iterate  Xt+i  in  (1.9)  by 

Xt+1  =  argmin (VG{xt),x)  +  (yt,Kx)  +  ^|| Bwt  -  Kx-  b\\2  +  ^\\x  -  xt\\2.  (1.12) 

xex  ^  * 

We  may  also  linearize  \\Bwt  —  Kx  —  6||2,  and  generate  xt+i  by 

71 

xt+i  =  argmin G(x)  +  (yt,Kx)  -  p(Bwt  -  Kxt  b,Kx)  +  j-\\x  —  xt\\ 2,  (1.13) 

xex  ^ 

as  discussed  in  [18,  10].  This  variant  is  called  the  preconditioned  ADMM  (P-ADMM).  If  we  linearize  both 
G(x)  and  || Bwt  —  Kx  —  6||2,  we  have  the  linearized  preconditioned  ADMM  (LP-ADMM),  in  which  (1.9)  is 
changed  to 


Xt+1  =  argmin( V G{xt),  x)  +  (yt,Kx)  -  p(Bwt  -  Kxt  -  b,Ax)  +  ^\\x  -  xt\\2.  (1.14) 

xex  ^ 

There  has  been  several  works  on  the  convergence  analysis  and  applications  of  ADMM,  L-ADMM,  and 
P-ADMM.  It  is  shown  in  [10]  that  P-ADMM  (Algorithm  1  with  9  =  1  in  [10])  solves  the  UCO  problem  with 
rate  of  convergence 


O 


f^\\K\\D2^ 


where  N  is  the  number  of  iterations  and  D  depends  on  the  distances  Dx*  and  Dy* .  There  are  also  several 
works  concerning  the  tuning  of  the  stepsize  r]t  in  L-ADMM,  including  [50,  51,  11]. 
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For  AECCO  problems,  in  [34]  ADMM  is  treated  as  an  instance  of  block-decomposition  hybrid  proximal 
extragradient  (BD-HPE),  and  it  is  proved  that  the  rate  of  convergence  of  the  primal  residual  of  ADMM  for 
solving  AECCO  is 


O 


£l 

N 


where  D  depends  on  B ,  Dx «  and  Dy* .  In  [25],  the  convergence  analysis  of  ADMM  and  P-ADMM  is  studied 
based  on  the  variational  inequality  formulation  of  (1.5),  in  which  similar  rate  of  convergence  is  achieved  under 
the  assumption  that  both  the  primal  and  dual  feasible  sets  in  (1.5)  are  bounded.  In  [42],  it  is  shown  that  if  X 
is  compact,  then  the  rate  of  convergence  of  ADMM  and  L-ADMM  for  solving  the  AECCO  problem  is 


G( /)  +  F(vT)  -  f*  +  p\\BwN  -  KxN  -  b\\2  <  O 


( LGD 2 


A' 


pD2 


y*,B 


N 


,V/o  >  0, 


(1.15) 


where  (xN  ,wN)  is  the  average  of  iterates  ajjjvj  of  the  ADMM  algorithm.  The  result  in  (1.15)  is  stronger  than 
the  results  in  [34,  25],  in  the  sense  that  both  primal  and  feasibility  residuals  are  included  in  (1.15),  while  in 
[34,  25]  there  is  no  discussion  on  the  feasibility  residual.  However,  the  rate  of  convergence  of  the  feasibility 
residual  is  still  not  very  clear  in  (1.15),  considering  that  G(xN)  +  F(wN)  —  f*  can  be  negative. 

1.3.  Accelerated  methods  for  AECCO  and  UCO  problems.  In  a  seminal  paper  [39],  Nesterov 
introduced  a  smoothing  technique  and  a  fast  first-order  method  that  solves  a  class  of  composite  optimization. 
When  applied  to  UCO  problems,  Nesterov’s  method  has  optimal  rate  of  convergence 


O 


( LGDl .  \\K\\Dx.Dy 

l  N2  N 


(1.16) 


where  Y  is  the  bounded  dual  space  of  the  UCO  problem.  Following  the  breakthrough  in  [40],  much  effort  has 
been  devoted  to  the  development  of  more  efficient  first-order  methods  for  non-smooth  optimization  (see,  e.g., 
[38,  1,  29,  15,  43,  48,  4,  28]).  Although  the  rate  in  (1.16)  is  also  0(1/N),  what  makes  it  more  attractive  is 
that  it  allows  very  large  Lipschitz  constant  Lq-  In  particular,  Lq  can  be  as  large  as  0(A),  without  affecting 
the  rate  of  convergence  (up  to  a  constant  factor).  However,  it  should  be  noted  that  the  boundedness  of  Y 
is  critical  for  the  convergence  analysis  of  Nesterov’s  smoothing  scheme.  Following  [40],  there  has  also  been 
several  studies  on  the  AECCO  and  UCO  problems,  and  it  has  been  shown  that  better  acceleration  results 
can  be  obtained  if  more  assumptions  are  enforced  for  the  AECCO  and  UCO  problem.  We  give  a  list  of  such 
assumptions  and  results. 

1) .  Excessive  gap  technique.  The  excessive  gap  technique  is  proposed  in  [38]  for  solving  the  UCO  problem 

in  which  G  is  simple.  Comparing  to  [40],  the  method  in  [38]  does  not  require  the  total  number  of 
iterations  N  to  be  fixed  in  advance.  Furthermore,  if  G(-)  is  strongly  convex,  it  is  shown  that  the  rate 
of  convergence  of  the  excessive  gap  technique  is  0(1 /IV2). 

2) .  Special  instance.  For  the  UCO  problem,  if  K  =  /  and  G  is  simple,  an  accelerated  method  with 

skipping  steps  is  proposed  in  Algorithm  7  of  [22],  which  achieves  0(1/N2)  rate  of  convergence.  The 
result  is  better  than  (1.16),  but  with  cost  of  evaluating  objective  value  functions  in  each  iteration. 
For  AECCO  problem  with  compact  feasible  sets,  it  is  shown  in  [33]  that  if  G(-)  is  a  composition  of 
a  strictly  convex  function  and  a  linear  transformation  and  F(-)  is  the  weighted  sum  of  1-norm  and 
some  2-norms,  the  asymptotic  rate  of  convergence  of  ADMM  method  and  its  variants  is  R-linear. 

3) .  Strong  convexity.  In  [10]  for  solving  the  UCO  problem  in  which  G  is  simple,  the  authors  showed 

that  P-ADMM  is  equivalent  to  their  proposed  method,  and  furthermore,  if  either  G(-)  or  F*(-)  is 
uniformly  convex,  then  the  rate  of  convergence  of  their  method  can  be  accelerated  to  0(1/N2).  It 
is  worth  noting  that  this  rate  of  convergence  is  weaker  since  it  uses  a  different  termination  criterion. 


Ut  is  assumed  in  [40]  that  X  is  compact,  hence  the  rate  of  convergence  is  dependent  on  Dx-  However,  the  analysis  in  [40]  is 
also  applicable  for  the  case  when  X  is  unbounded,  yielding  (1.16). 
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In  addition,  if  both  G(-)  and  F*(-)  are  uniformly  convex  (hence  the  objective  function  in  (1.4)  is 
continuously  differentiable),  the  proposed  method  in  [10]  converges  linearly.  When  both  G(x)  and 
F(x)  are  strongly  convex  in  the  AECCO  problem,  an  accelerated  ADMM  method  is  proposed  in  [23], 
which  achieves  the  0(1/N2)  rate  of  convergence. 

It  should  be  noted  that  all  the  methods  in  the  above  list  require  more  assumptions  on  the  AECCO  and 
UCO  problems  (e.g.,  simplicity  of  G(-),  strong  convexity  of  G(-)  or  F(-)),  in  comparison  with  Nesterov’s 
smoothing  scheme.  More  recently,  we  proposed  an  accelerated  primal-dual  (APD)  method  for  solving  the 
UCO  problem  [13],  which  has  the  same  optimal  rate  of  convergence  (1.16)  as  that  of  Nesterov’s  smoothing 
scheme  in  [40].  The  advantage  of  the  APD  method  over  Nesterov’s  smoothing  scheme  is  that  it  does  not  require 
boundedness  on  either  X  or  Y .  The  basic  idea  of  the  APD  method  is  to  incorporate  a  multi-step  acceleration 
into  LP-ADMM,  and  this  has  motivated  our  studies  on  accelerating  the  linearized  ADMM  method  for  solving 
the  AECCO  and  UCO  problems. 

1.4.  Contribution  of  the  paper.  The  main  interest  of  this  paper  is  to  develop  an  accelerated  linearized 
ADMM  algorithm  for  solving  AECCO  and  UCO  problems,  in  which  G  is  a  general  convex  and  non-simple 
function.  Our  contribution  in  this  paper  mainly  consists  of  the  following  aspects. 

Firstly,  we  propose  an  accelerated  framework  for  ADMM  (AADMM),  which  consists  two  novel  accelerated 
linearized  ADMM  methods,  namely,  accelerated  L-ADMM  (AL-ADMM)  and  accelerated  LP-ADMM  (ALP- 
ADMM).  We  prove  that  AL-ADMM  and  ALP- ADMM  have  better  rates  of  convergence  than  L-ADMM  and 
LP-ADMM  in  terms  of  their  dependence  on  Lq-  In  particular,  we  prove  that  both  accelerated  methods  can 
achieve  rates  similar  to  (1.16),  hence  both  of  them  can  efficiently  solve  problems  with  large  Lipschitz  constant 
Lg  (as  large  as  f2(JV)).  We  show  that  L-ADMM  and  LP-ADMM  are  special  instances  of  AL-ADMM  and 
ALP-ADMM  respectively,  with  rates  of  convergence  <D{1/N).  To  improve  the  performance  in  practice,  we  also 
propose  a  simple  backtracking  technique  for  searching  Lipschitz  constants  Lg  and  ||AT||. 

Secondly,  the  proposed  framework  solve  both  AECCO  and  UCO  problems  with  unbounded  feasible  sets, 
as  long  as  a  saddle  point  of  problem  (1.5)  exists.  Instead  of  using  the  perturbation  type  gap  function  in  [13], 
our  convergence  analysis  is  performed  directly  on  both  the  primal  and  feasibility  residuals.  The  estimate  of 
the  rate  of  convergence  will  depend  on  the  distance  from  the  initial  point  to  the  set  of  optimal  solutions. 

2.  An  accelerated  ADMM  framework.  In  this  section,  we  propose  an  accelerated  ADMM  frame¬ 
work  for  solving  AECCO  (1.1)  and  UCO  (1.4).  The  proposed  framework,  namely  AADMM,  is  presented  in 
Algorithm  2. 

In  AADMM,  the  binary  constant  %  in  (2.2)  is  either  0  or  1,  the  superscript  “ag”  stands  for  “aggregate”, 
and  “md”  stands  for  “middle”.  It  can  be  seen  that  the  middle  point  x™d,  and  the  aggregate  points  w ^1, 
and  yl+Y  are  weighted  sums  of  all  the  previous  iterates  {xi}-=1,  {w j}*^,  and  {yiYjXn  respectively.  If 

the  weights  at  =  1,  then  x™d  =  xt  and  the  aggregate  points  are  exactly  the  current  iterates  wt+ i,  Xt+i  and 
yt+\-  In  this  case,  if  \  =  0,  and  9t  =  rt  =  pt  =  p,  then  AADMM  becomes  L-ADMM,  and  if  in  addition  G  is 
simple,  then  AADMM  becomes  ADMM.  On  the  other  hand,  if  x  =  1,  then  AADMM  becomes  LP-ADMM, 
and  if  in  addition  G  is  simple,  AADMM  becomes  P-ADMM. 

In  this  work,  we  will  show  that  if  G  is  non-simple,  by  properly  specifying  the  parameter  at,  we  can 
significantly  improve  the  rate  of  convergence  of  Algorithm  2  in  terms  of  its  dependence  on  Lg,  with  about  the 
same  iteration  cost.  We  call  the  acceleration  for  y  =  0  the  accelerated  L-ADMM  (AL-ADMM),  and  call  that 
for  x  =  1  the  accelerated  LP-ADMM  (ALP-ADMM). 

Next,  we  define  certain  appropriate  gap  functions. 

2.1.  Gap  functions.  For  any  z  =  ( w,x,y )  £  Z  and  z  =  ( w,x,y )  £  Z,  we  define 

Q(w,  x,  y;  w,x,  y)  :=  [G(x)  +  F(w)  —  (y,  Bw  —  Kx  —  6)]  —  [G(x)  +  F(w)  —  (y,  Bw  —  Kx  —  b)].  (2.8) 

For  simplicity,  we  use  the  notation  Q(z;z)  :=  Q(w,x,y;w,x,y),  and  under  different  situations,  we  may  use 
notations  Q{z\w,x,y )  or  Q(w,x,y;  z)  for  the  same  meaning.  We  can  see  that  Q(z*,z)  >  0  and  Q(z,z*)  <  0 
for  all  z  £  Z,  where  z*  is  a  saddle  point  of  (1.5),  as  defined  in  Section  1.1.  For  compact  sets  W  C  W,X  C 
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Algorithm  2  Accelerated  ADMM  (AADMM)  framework 

Choose  Xi  £ 

X  and  wi  £  W 

such  that  Bwi  =  Kx i  +  b.  Choose  Set  = 

aq  l  aq 

=  xi,  w±  =  w\  and 

II 

II 

o 

for  t  =  1, . . . 

,  N  -  1  do 

rr.'md  _ 

xt  ~ 

(1  -  at)xf9  +  atxt , 

(2.1) 

Xt+1  = 

argmin(VG(xJ"'li),  x)  -  \0t{Bwt  -  Kxt  -  b, 

Kx) 

xex 

+  ^  \\Bwt  -  Kx-  b\\2  +  ( yt,Kx )  + 

yll  x~xt\\2, 

(2.2) 

ag 

xt+ 1  — 

(1  -  at)xt9  +  cttxt+i, 

(2.3) 

Wt+1  = 

argmini7’(w;)  -  ( yt,Bw )  +  ^r\\Bw  -  Kxt+1 

-fell2, 

(2.4) 

w€_  W  ^ 

$ 

II 

(1  -  at)w? 9  +  atwt+i, 

(2.5) 

yt+i  = 

yt  -  Pt(Bwt+ 1  -  Kxt+i  -  b), 

(2.6) 

aq 

yt+ 1  = 

(1  -  at)yt9  +  OLtyt+ 1- 

(2.7) 

end  for 

Output  z\ Y  = 

=  W,49). 

X,  Y  C  y,  the  duality  gap  function 


sup  Q(w,x,y;w,x,y)  (2.9) 

w€W,x£X,y£Y 

measures  the  accuracy  of  an  approximate  solution  (w,  x,  y )  to  the  saddle  point  problem 

min  maxG(x)  +  F(w )  —  (y,  Bw  —  Kx  —  b). 
x£X,w€W  y€Y 

However,  our  problem  of  interest  (1.1)  has  a  saddle  point  formulation  (1.5),  in  which  the  feasible  set  (W,  X ,  y) 
may  be  unbounded.  Recently,  a  perturbation-based  termination  criterion  is  employed  by  Monteiro  and  Svaiter 
[35,  36,  34]  for  solving  variational  inequalities  and  saddle  point  problems.  This  termination  criterion  is  based 
on  the  enlargement  of  a  maximal  monotone  operator,  which  is  first  introduced  in  [8].  One  advantage  of  using 
this  termination  criterion  is  that  its  definition  does  not  depend  on  the  boundedness  of  the  domain  of  the 
operator.  We  modify  this  termination  criterion  and  propose  a  modified  version  of  the  gap  function  in  (2.9). 
More  specifically,  we  define 


9y(v,  z)  :=  sup  Q(w*,x*,y;  z)  +  (v,  y)  (2.10) 

v&y 

for  any  closed  set  Y  C  y,  and  for  any  z  &  Z  and  v  £  Y.  In  addition,  we  denote 

9y{z)  '■=  <?v(0,  z)  =  sup  Q(u>*,x*,y;z).  (2.11) 

y&Y 

If  Y  =  y,  we  will  omit  the  subscript  Y  and  simply  use  notations  g(v,  z )  and  g{z). 

In  Propositions  2.1  and  2.2  below,  we  describe  the  relationship  between  the  gap  functions  (2.10)-(2.11) 
and  the  approximate  solutions  to  problems  (1.1)  and  (1.4). 

Proposition  2.1.  For  any  Y  C  y,  if  gy(Bw  —  Kx  —  b,  z)  <  e  <  oo  and  \\Bw  —  Kx  —  6||  <8  where 
z  =  ( w,x,y )  €  Z,  then  (w,x)  is  an  (e,  6) -solution  of  (1.1).  In  particular,  when  Y  =  y,  for  any  v  such  that 
g{v,  z)  <  e  <  oo  and  ||i;||  <  S,  we  always  have  v  =  Bw  —  Kx  —  b. 


Proof.  By  (2.8)  and  (2.10),  for  all  v  £  y  and  Y  C  y ,  we  have 


gy(v,  z )  =  sup[G(x)  +  F(w)  —  (y,  Bw  —  Kx  —  b)}  —  [G(x*)  +  F(w*)]  +  (v,  y) 

yeY 

=  G(x )  +  F{w)  —  f*  +  sup  {—y,  Bw  —  Kx  —  b  —  v). 

y£Y 

From  the  above  we  see  that  if  gy(Bw  —  Kx  —  b,  z)  =  G(x)  +  F(w)  —  f*  <  e  and  || Bw  —  Kx  —  6||  <  <5,  then 
(w,  z )  is  an  (e,  5)-solution.  In  addition,  ifY  =  y,  we  can  also  see  that  g(v,  z)  =  oo  if  v  ^  Bw  —  Kx  —  b ,  hence 
g[v,  z)  <  oo  implies  that  v  =  Bw  —  Kx  —  6.0 

From  Proposition  2.1  we  can  see  that  when  Y  =  y  and  g(v,z )  <  e,  ||u||  is  always  the  feasibility  residual 
of  the  approximate  solution  ( w,x ).  Proposition  2.2  below  shows  that  in  some  special  cases,  there  exists  an 
approximate  solution  to  problem  (1.1)  that  has  zero  feasibility  residual. 

Proposition  2.2.  Assume  that  B  is  an  one-to-one  linear  operator  such  that  BW  =  y,  and  F(-)  is 
Lipschitz  continuous,  then  the  set  Y  :=  (£?*)_1  domF*  is  bounded.  Moreover,  if  gy{z)  <  £,  then  the  pair 
( w,x )  is  an  e-solution  of  (1.1),  where  w  =  {B*)~l{Kx  +  6). 

Proof.  We  can  see  that  w  is  well-defined  since  BW  =  y.  Also,  using  the  fact  that  Ff)  is  finite  valued,  by 
Corollary  13.3.3  in  [f5]  we  know  that  domF1*  is  bounded,  hence  Y  is  bounded.  In  addition,  as  Bw—Kx—b  =  0, 

gy(z)  =  sup[G(x)  +  F(w)  —  {y,  Bw  —  Kx  —  6)]  —  [G(x*)  +  F(w*)] 

y&Y 

=  G(x)  +  F(w)  —  f*  +  sup  {—y,  Bw  —  Bw) 

y<£Y 

=  G(x)  +  F(w)  —  f*  +  sup[F(u;)  —  F(w)  —  ( B*y ,  w  —  tD)]. 

y<£Y 

If  B*Y  ndF(u>)  ^  0,  then  from  the  convexity  of  F(-)  we  have 

gy{z)  >  G(x)  +  F(w)  —  /*, 

thus  ( w ,  x)  is  an  e-solution.  To  finish  the  proof  it  suffices  to  show  that  B*Y  (~l  dF(w)  ^  0.  Observing  that 
sup  (w,w)—F*(w)=  sup  (w,w)—F*(w)=  sup  (w,  ui)  —  F*(w), 

w£B*Y  w£domP*  wGW 

and  using  the  fact  that  Y  is  closed,  we  can  conclude  that  there  exists  B*y  £  B*Y  such  that  B*y  attains  the 
supremum  of  the  function  (w,w)  —  F*(w)  with  respect  to  w.  By  Theorem  23.5  in  [f5],  we  have  B*y  £  dF(w), 
and  hence  dF{w)  fl  B*Y  ^0.0 

A  direct  consequence  of  the  above  proposition  is  that  for  the  UCO  problem,  if  F(-)  is  Lipschitz  continuous 
and  gy{z)  <  e,  then  (x,  Kx)  is  an  e-solution. 

2.2.  Main  estimations.  In  this  subsection,  we  present  the  main  estimates  that  will  be  used  to  prove 
the  rate  of  convergence  for  AADMM. 

Lemma  2.3.  Let 


rt 


ri 

(i-«t)rt_i 
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when  at  =  1, 
when  t  >  1. 


(2.12) 


For  all  y  ^y,  the  iterates  {z“9}t>  1  :=  {{w^9 ,  x^9 ,  Vt9)}t>i  of  Algorithm  2  satisfy 

t 

/  I  —  rv  ■  I  \ 

Q(w*,x*,y;z19) 


1 


Q{w*,x*,y,Ztl1)  “E  ( 
2=2  ' 


1  -  OLi 


1 


Tj  IVi 

— l 


<  Bt{x*,X[t+i],V[t})  +Bt(y,y[t+1\,p[t]  )  +  Bt{Bw* ,  Bw[t+1],  0[t])  -  \Bt(Kx* ,  Kx[t+1],  9[t]) 


E 


ai{Ti  @i)  ||  tj  jy-  *  i  1 1 2  ,  ST^  ai(.Ti  @i)  11  jy/  *M|2 

Pwi+i-ATx  -fell  +2^ - T7F - ||A(a:i+i -x 


(2.13) 


2=1 

t 


2r, 


2  =  1 


2r,: 


-  E  °‘^Y  p2P^  \\yi  -  Vi+ ill2  -  E  “  Lca*  -  X^IIAII2)  HaJi  -  xl+1\\2 


2=1 


where  the  term  Bt(-,-,-)  is  defined  as  follows:  for  any  point  v  and  any  sequence  ^[t+i]  in  any  vectorial  space 
V,  and  any  real  valued  sequence  7m, 


Bt{v,v[t+ i],7[t])  :=  E^jE1  ( 1 1  ^2  ^  1 1 2  -  11^2+1  -  i’ll2)  ■ 


i= 1 


(2.14) 


Proof.  To  start  with,  we  prove  an  important  property  of  the  function  Q(-,-)  under  Algorithm  2.  By 
convexity  of  G(-)  we  have 

G{xat+i)  <  G(x?d)  +  (VG(xTd),x?9 1  -  x?d)  +  E|| xat9+1  -  x?d\\\  (2.15) 

Moreover,  by  equations  (2.1)  and  (2.3),  xf^  —  x =  at{xt+\  —  xt).  Using  this  observation,  equation  (2.15) 
and  the  convexity  ofG(-),  we  have 

G{xata+1)  <  G(x?d)  +  {VG{x?d),xat9+1  -  x?d)  +  ~—f~  1 1  Xt-\-  i  -  £t||2 

=  G(xTd)  +  (1  -  at)(VG{x?d),xat9  -  x?d)  +  at(S7G(xTd),xt+1  -  x?d)  +  ^±\\xt+1  -  zt||2 
=  (1  -  at)  [G{xTd)  +  {\7G{x?d),xat9  -  x ?*)]  +  at  [G(x?d)  +  (VG(^),%i  -  aT*)] 

+  ^||zt+1-xt||2  (2.16) 

=  (1  -  at)  [G(x?d)  +  (VG(xrd),  xat 9  -  x?d)]  +  at  [G(x?d)  +  (VG(x?d),  *  -  x?d)) 

+  at(VG(xrd),  xt+i  -  x)  +  ^fi||xm  -  xt||2 

<  (1  -  at)G{ xf9)  +  atG(x)  +  ^(VG^^Xt+i  -  x)  +  ^Qt  ||zt+i  -  a;t||2,  Mx  €  X. 

By  (1.10),  (1.11),  (2.8),  (2.16)  and  the  convexity  of  F(-),  we  conclude  that 
Q(z-,zat9+1)-{l-at)Q(z-,zat9) 

=  [G(xfl  x)  +  i^w^)  -  (y,  -  A'i“®i  -  6)]  -  \G{x)  +  A(u;)  -  -  Kx  -  &)] 

—  (1  —  at)[G(a;“s)  +  F(wf9)  —  (y,  13«j“s  —  Kxf9  —  fe)]  +  (1  —  at)[G(x)  +  F(w)  —  (y“s,  Bw  —  Kx  —  fe)] 

=  [G^^)  -  (1  -  at)G(o:“s)  -  atG{x)]  +  [Ffw^)  -  (1  -  at)F{wf9)  -  atF(w)] 

-  at(y,Bwt+i  -  Kxt+i  -  fe)  +  at(yt+i,Bw  -  Kx  -  fe) 

<  at  {{VG^Wi  -x)  +  [F(wt+1)  -  F(w)]  +  ^\\xt+1  -  zt||2 
~{y,Bwt+ 1  -  Kxt+ 1  -  fe)  +  (yt+i.Sw  -  A'a;  -  fe)  j  . 


(2.17) 


Next,  we  examine  the  optimality  conditions  in  (2.2)  and  (2.4).  for  all  x  £  X  and  w  £  W,  we  have 


(X/G(x™d)  +  r)t{xt+i  -  xt),  xt+i  -  x)  -  ( Ot{Bwt  -  Kxt  -  b)  -  yu  K{xt+ 1  -  a;))  <  0,  and 
F(wt+ 1)  -  F(w)  +  (Tt(Bwt+ 1  -  -Kxt+i  -  6)  -  j/t,  ,B(uJt+i  -  w))  <  0, 


where 


Xt  ~  XXt  +  (1  -  X)xt+ 1- 


(2.18) 


Observing  from  (2.6)  that  Bwt+i  -  Kxt+ 1  —  b=(yt-  yt+i)/pt  and  Bwt  -  Kxt  —  b=(yt  —  yt+i)/pt  -  K(xt  - 
Xt+ 1)  +  B(wt  —  iut+i),  i/ie  optimality  conditions  become 

fS7G{x™d)  +rjt(xt+i  -  xt),xi+i  -  x)  +  (^y  -  1^  (yt  -  yt+i)  -  yt+i, -if(xt+i  -  x)) 

+  9t(K(xt  -  xt+i),K(xt+i  -  x))  +  8t(B(wt  -  wt+ 1),  —K(xt+ i  -  x))  <  0,  and 
F(w;t+i)  -  F(w)  +  ( ^y  -  1^  (yt  -  yt+ i)  -  yt+i,B(wt+i  -  w))  <  0. 


Therefore, 


('\7G(x™d),xt+ 1  -  x)  +  F(wt+ 1)  -  F(w)  -  (y,Bwt+ i  -  ivxt+i  -  6)  +  {yt+1,Bw  -  Kx-b) 
<  (rjt(xt  -  xt+i),xt+i  -  x)  +  (j/t+i  -  y,Bwt+i  -  Kxt+\  -  b) 


Pt 


-  ( - l)(yt-  yt+i),-K(xt+1  -  x))  -  (  —  -  l  )(yt-  yt+i),  B(wt+ 1  -  w)) 


Tt 


Pt 


+  9t(K(xt+ 1  -  xt),i£(xt+i  -  x))  +  0t(-B(iut+i  -  u>t),  -if(xt+i  -  x)). 

Three  observations  on  the  right  hand  side  of  (2.19)  are  in  place.  Firstly,  by  (2.6)  we  have 
(r)t(xt  ~  xt+i),xt+i  -  x)  +  (yt+i  -  y,  Bwt+i  -  Kxt+i  -  b) 


=  rjt(xt  -  xt+i,xt+i  -  x)  h — (yt+i  -y,yt-  yt+i) 

Pt 


=  y(l \xt  -  x\\2  -  ||xt+i  -  x||2)  -  y  (||xt  -  xt+i||2)  +  (II yt  -  y\\2  -  \\yt+i  -  y\\2  -  || yt  -  yt+ 1||2) , 

and  secondly,  by  (2.6)  we  can  see  that 


Pt , 


1 


(2.19) 


(2.20) 


B(wt+ 1  -w)  =  — ( yt  -  yt+i)  +  (Kxt+ i  -  Kx)  -  ( Bw  -  Kx  -  b), 
Pt 


(2.21) 


and 


9t 


,Pt 

Tt  -  9t 

Pt 

Tt  -  9t 
2 

Tt  -  9t 


((  —  -  (yt  -  yt+i),K(xt+i  -  x))  -  ((^y  -  1 J  (yt-  yt+ 1),  y(j/t  -  2/t+i)  +  (-Kxt+i  -  Kx)) 

(yt  -  2/t+i,-A'(xt+i  -  x))  -  n  2Pt\\yt  -  yt+ 1||2 

Pt 

■4 II yt  -  yt+  ill2  +  ||A'(xt+i  -  x)||2  -  II  —  (yt  -  yt+i)  +  K(xt+1  -  x)| 


LPt 


Pt 


Tt  ~  Pt 
P2t 


l|yt-yt+i||2 


—  II yt  -  yt+i\\2  +  ||if(xt+i  -  x)||2  -  \\Bwt+i  -  Kx  -  6|p 
.Pt 


Tt  ~  Pt 
p2t 


\\yt-yt+i\\2. 
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(2.22) 


Thirdly ,  from  (2.18)  we  have 


9t{K(xt+ 1  -  xt),K(xt+ 1  -  x))  +  et{B(wt+i  -  wt ),  -K(xt+ 1  -  a;)) 

=  ^K^Xt  ~  “  X)H2  “  WK(Xt  ~  ^i+l)H2) 

+  Y  (II Bwt  —  Kx  —  b\\2  -  \\Bwt+i  -  Kx  -  b\\2  +  \\Bwt+i  -  Kxt+ i  -  6||2  -  || Bwt  -  Kxt+i  -  b\\ 2)  ^  ^ 

<~^Y  (H^( **  -  ^H2  -  H^(**+i  -  ^)ll2)  +  ^^11^  -  *mll2 

+  Y  i.WBwt  -  Kx-  b\\2  -  \\Bwt+i  -  Kx-  b\\2)  +  ^\\yt  -  yt+i\\2  -  y|| Bwt  -  Kxt+i  -  b\\2 , 
where  the  last  inequality  results  from  the  fact  that 


x\\K{xt  -  art+i)||  <  xll^'lllkt  -  Zt+i||-  (2.24) 

Applying  (2.19)  -  (2.23)  to  (2.17),  we  have 


<  Yt  {  ~2^Xt  ~  ^l2  _  ll^+i _  ^H2)  +  -  yll2  -  \\vt+i  -  y\\2)  -  Tt2p2tW vt  -  yt+ ill2 

+  Y\\Bwt  -  Kx  -  b\\2  -  ^\\Bwt+1- Kx-b\\2  -  ^y{\\K(xt  -  x)\\2  -  \\K(xt+1  -  x)\\2)  (2.25) 

+  ((^  -  ( Vt  -  yt+i),Bw  —  Kx  —  b)  +  Tt  2  6t  \\K(xt+i  -  :c)||2  -  y|| Bwt  -  Kxt+1  -  b\\2 

LGat  -  xOt\\K\\2)  \\xt  -  xt+ 1||2  |  • 

Letting  w  =  w*  and  x  =  x*  in  the  above,  observing  from  (2.12)  that  Tt_i  =  (1  —  at)/Tt,  in  view  of  (2.14)  and 
applying  the  above  inequality  inductively,  we  conclude  (2.13).  □ 

There  are  two  major  consequences  of  Lemma  2.3.  If  at  =  1  for  all  t,  then  the  left  hand  side  of  (2.13) 
becomes  jA  ]T)*=2  Q{z\  zi 9)-  Chi  the  other  hand,  if  at  £  [0, 1)  for  all  t,  then  in  view  of  (2.12),  the  left  hand  side 
of  (2.13)  is  Q{z\  ZtL)/Tt.  This  difference  is  the  main  reason  why  we  can  accelerate  the  rate  of  convergence  of 
AADMM  in  terms  of  Lq. 

In  the  next  lemma,  we  provide  possible  bounds  of  £>(•,  ■,  •)  in  Lemma  2.3. 

Lemma  2.4.  Suppose  that  V  is  any  vector  space  and  V  C  V  is  any  convex  set.  For  any  v  £  V ,  'Urt+i]  C  V 
and  7m  C  K,  we  have  the  following: 

a) .  If  the  sequence  is  decreasing,  then 

Bt(v,v[t+ i],7[t])  <  Y^r-\\vi  ~  v\\2  -  Y^\\yt+i  ~  v\\2 .  (2.26) 

b) .  If  the  sequence  {oti^i/Ti}  is  increasing,  V  is  bounded  and  r>[t+i]  C  V,  then 

Bt{v,v[t+  i],7[t])  <  yy~dv  ~  Yfr\\vt+i  -  v\\2 ■  (2-27) 

Proof.  By  (2.14)  we  have 


Q!i7i 

2Ti 


t-  1 


i= 1 


«i7i 


2Ti 


Qj+l7i+l\ 
2Tl+i  ) 


ll«i+i-«H2 


a-tTt 

2  Tt 


IK+i-^f- 
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If  the  sequence  {of^i/T,}  is  decreasing,  then  the  above  equation  implies  (2.26).  If  the  sequence  is 

increasing,  V  is  bounded  and  V[t+i]  C  V,  then  from  the  above  equation  we  have 


Bt(i>,W[t+i],7[t])  < 


2r, 

O-tlt 

2Tt 


Dz  — 
uv 


t-1 

E 

i= 1 
(*tlt 


2F  i  2Ti+1  J 


D2  — 

u v 


a  tit 

2r* 


\\vt+i~v\\2 


2Tt 


IK+i-^ll2 


hence  (2.27)  holds.  □ 

2.3.  Convergence  results  on  solving  UCO  problems  in  bounded  domain.  We  study  UCO  prob¬ 
lems  with  bounded  feasible  sets  in  this  subsection.  In  particular,  throughout  this  subsection  we  assume  that 

Both  X  and  Y  :=  dom  F*  are  compact,  and  B  =  I,  b  =  0.  (2.28) 


It  should  be  noted  that  the  boundedness  of  Y  above  is  equivalent  to  the  Lipschitz  continuity  of  F(-)  (see, 
e.g,  Corollary  13.3.3  in  [45]). 

The  following  Theorem  2.5  generalizes  the  convergence  properties  of  ADMM  algorithms.  Although  the 
convergence  analysis  of  ADMM,  L-ADMM  and  P-ADMM  has  already  been  done  in  several  literatures  (e.g., 
[34,  25,  10,  42]),  Theorem  2.5  gives  a  unified  view  of  the  convergence  properties  of  all  ADMM  algorithms. 

Theorem  2.5.  In  A  ADMM,  if  the  parameters  of  are  set  to  at  =  1,  dt  =  rt  =  pt  =  p  and  rjt  = 
lg  +  XP\\K\\2 ,  then 


G(xt+1)  +  F(Kxt+1)  -  f*  <  +  ^|| KfD\  +  D 


D2 

uY 


2  r>2  (1  X)P  n2 


(2.29) 


t+ 1 


where  x 


t+ 1  ._ 


■  Xi .  Specially,  if  p  is  given  by 


i= 2 


P  = 


Dy 


X\\K\\Dx  +  {1-x)Dx,k 


(2.30) 


then 


G(xt+1)  +  F(wt+1)  -  f*  <  Lg°x  +  x\\k\\dxDy  +  (1  -  x)Dx,kDy  . 


(2.31) 


Proof.  Since  at  =  1,  By  (2.3),  (2.5)  and  (2.7)  we  have  xf9  =  Xt,  wf9  =  Wt  and  yf9  =  yt,  and  we  can  see 
that  =  1  satisfies  (2.12)  .  Applying  the  parameter  settings  to  RHS  of  (2.13)  in  Lemma  2.3,  we  have 


V 


Bt(x*,x[t+1],V[t])  =  A(||Xl  ^  z*]]2  -  H^+r  -  z*||2) 


<  ^Dl,  +  f\\KTDl  -  f  \\K\\2\\xt+1  -  x*||2, 


Bt(w*,w[t+1],0[t])  =  |(|K  -  iu*||2  -  |K+1  -  w* ||2)  <  ^  =  — f 
-XBt{Kx*,Kx[t+1],0[t])  =  -  ?f(\\KXl  -  Kx* ||2  -  ||ATzt+1  -  Kx* ||2) 

<  ^(IIj/i  -2/II2  -  ht+i  -y\\2)  <  Vy  e  Y. 


it 


Therefore,  by  Lemma  2.3  we  have 
t+ 1 


Y,Q(w*,x*,y,Zi)  <  ^-D\,  +  \\KfDl .  +  {-L^kD2x,tK 


i—2 
2  x  2 


D2 

UY 

2  P 


<  +  ^|| KfDl  +  {2-^Dl S  V„  6  Y. 


Furthermore,  noticing  that  for  all  y  £  Y,  by  the  convexity  of  Q(x* ,w*  ,y;  ■), 

t+i 


,  t+i  t+ l 

Q(w*,x*,y;zt+1)  <  -^Q(w* ,x* ,y\ zf),  where  zt+1  := 


i—2 


i= 2 


Applying  the  two  inequalities  above  to  (2.11)  and  Proposition  2.2,  we  conclude  ( 2.29 ),  and  (2.31)  follows 
immediately.  □ 

Although  AADMM  unifies  all  ADMM  algorithms,  what  makes  it  most  special  is  the  variable  weighting 
sequence  {at}t>i  (rather  than  at  =  1)  that  accelerates  its  convergence  rate  with  respect  to  its  dependence  on 
Lq ,  as  shown  in  Theorem  2.6  below. 

Theorem  2.6.  In  AADMM,  if  the  parameters  are  set  to 

2  „  (t-l)p  J  2LG  +  Xpt\\K\\2 

at  =  t-TiD  =  Pt  =  P,  0t  = - - - ,  and  pt  = - ; - 


then 


t  +  1 


G^+1)  +  fKi)-r  < 


t 


2  LgD2  ,  1 


/n  particular,  if  p  is  given  by  (2.30),  then 

G{xat9+1)  +  F(Kxat9+1)-r  <. 

Proof.  It  is  clear  that 


t(t  +1)  t  +  1 
2  LgD] 


XP\\K\\2D2x  +  (1  -  x)pD2XK  + 


A 


at  = - -  and  Tt  =  — - —  satisfies  (2.12),  and  -pr  =  t. 

t  +  1  t{t  1)  if 

By  the  parameter  setting  (2.32)  and  the  definition  of  B( •,•,•)  in  (2.14),  it  is  easy  to  calculate  that 

Vt  ~  LGat  -  x0t\\Kf  >  0,  rt  >  6tl 

Bt{w*,w[t+ i],6>t)  -  a^2p .  ^  ~  W*H2  =  -ylK+i  _w;l2  ^  °> 

2=1 

-  XBt(Kx*,Kx[t+1],6t)  +  £  ai(  o  ~  ^  ll^i+i  -  ^*ll2 


(2.32) 

n2  1 

UY 

P  .  ' 

(2.33) 

Dy }■ 

(2.34) 

(2.35) 

2=1 


2r, 


_  X.P^  \\  TS  K-'™*  l|2  t  x)p  \  '  II  z^^,*l|2^  Xpt  I|z^||2||  ,  ™*l|2  I  (^  X)Pl  7~j2 

—  — 2~ —  II  ^  2 - /  ,  ~  Kx  ||  <  — 2 — \\xt+i  -  x  ||  H  - - Dx  k. 

i= 1 

Moreover,  by  (2.4),  (2.6)  and  Moreau’s  decomposition  theorem  (see,  e.g.,  [37,  If,  18]),  we  have 
Vt+i  =  Vt  ~  p(wt+  i  -  Kxt+i  -  b) 


=  (yt  +  pKxt+i  +  pb)  -  pargminF(uj)  +  A\w  -  ~(yt  -  Kxt+1  -  b) ||2 

wGW  *  P 

=  argmin F*(y)  +  f\\y  -  ~(yt  -  Kxt+ i  -  6)||2, 
yey  2 p  p 


(2.36) 
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which  implies  that  y\t+i]  C  Y .  Using  this  observation  together  with  the  fact  that  at/(Ttpt)  =  tj p,  and  applying 
(2.27)  in  Lemma  2.f,  we  obtain 


Bt{y,y[t+i]>p[t])  <  —pDY'  Vy£  Y- 


Finally ,  noting  that  ati)t/Tt  =  2 LG  +  xpi||iT||2,  by  (2.27)  in  Lemma  2-4  we  have 


Bt(x*,x[t+1],V[t])  <  ^-D\  -  ^i\\xt+1  -  x*\\*  <  LGD\  +  ^||Al|2||xt+1  -  **||2. 


Applying  all  above  inequalities  to  (2.13)  in  Lemma  2.3,  we  have 


XPt i 


Vi  zt+\)  —  LgD'x  +  ^7r\\K\\2 D2X  + 


(i  -  x)pt 

2 


n2 

UX  ,K 


2 

y  i 


Vy  g  y. 


Using  (2.35)  and  applying  Proposition  2.2,  we  conclude  (2.33),  and  (2.34)  comes  from  (2.30)  and  (2.33).  0 

In  view  of  Theorems  2.5  and  2.6,  several  remarks  on  the  AADMM  algorithms  are  in  place.  Firstly, 
Theorem  2.6  provides  an  example  of  choosing  stepsizes  in  AL-ADMM  and  ALP-ADMM,  that  leads  to  better 
convergence  properties  w.r.t  the  dependence  on  LG  than  L-ADMM  and  LP-ADMM  respectively.  In  particular, 
AL-ADMM  and  ALP-ADMM  allow  LG  to  be  as  large  as  Ul{N)  without  affecting  the  rate  of  convergence  (up 
to  a  constant  factor).  The  comparison  of  these  AADMM  algorithms  in  terms  of  their  rates  of  convergence  is 
shown  in  Table  2.1.  Secondly,  ALP-ADMM  has  the  same  rate  of  convergence  as  Nesterov’s  smoothing  scheme 
[40],  and  achieves  optimal  rate  of  convergence  (1.16).  Moreover,  we  can  see  from  (2.36)  that  the  APD  method 
in  [13]  is  equivalent  to  ALP-ADMM.  Nonetheless,  AL-ADMM  has  better  constant  in  the  estimation  of  rate 
of  convergence  than  both  ALP-ADMM  and  Nesterov’s  smoothing  scheme,  since  Dx,k  <  ||AT||D.y-  However, 
the  computational  time  for  solving  problem  (2.2)  with  y;  =  0  is  usually  higher  than  that  for  %  =  1,  hence 
AL-ADMM  has  higher  iteration  cost  than  that  of  ALP-ADMM.  The  trade-off  between  better  rate  constants 
and  cheaper  iteration  costs  has  to  be  considered  in  practice.  Thirdly,  while  Theorem  2.5  describes  only  the 
ergodic  convergence  of  the  ADMM  algorithms,  Theorem  2.6  describes  the  convergence  of  aggregate  sequences 
{zfJli  }t>i ,  which  are  exactly  the  outputs  of  the  accelerated  schemes.  Finally,  in  ADMM  methods  we  have 
Tt  =  pt  =  0t ,  while  in  Theorem  2.6  we  only  have  r4  =  pt,  although  9t  — >  pt  when  t  — >  oo.  In  fact,  if  the 
total  number  of  iterations  is  given,  it  is  possible  to  choose  a  set  of  equal  stepsize  parameters,  as  described  by 
Theorem  2.7  below. 

Theorem  2.7.  In  AADMM,  if  the  total  number  of  iterations  N  is  chosen,  and  the  parameters  are  set  to 


2  n  PN  w 

at  =  j-,  vt=Tt=pt  =  — ,  and  pt 


2LG  +  XpN\\K\\2 


where  p  is  given  by  (2.30),  then 


G(xaN3)  +  F(KxaNg)  -f*<  ^  [x\\K\\DxDY  +  (1  -  X)Dx,kDy]  .  (2.37) 
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Proof.  Using  equation  (2.35)  as  well  as  the  definition  of  B( in  (2.14),  it  is  easy  to  calculate  that 


T]t  -  LGat  -  x6»t||A'||2  >  0, 

K  f  *  \  2LG  +  XPN\\K\\2  nl  *|,2  II  *  m2\ 

Bt(x  ,x[t+1  ],V[t})=  - o - (11*1- 37  II  -  ll*t+ 1-*  II  ) 


< 


2La±2^ml(Dl__]]xt+i_x.n 

pN„, . ,ll2  „ . ,ll2.  ,  pNDl.  _  pNDi.  K 


Bt(w*,w[t+1],e[t])=  —(\\w1~w*\\  -  \\wt+i -w*||  )  <  2 

-XBt(Kx\Kx\M],<l ,,,)  =  -  ^fi(\\Kx,  -  Kx’f  -  \\Kxt+i  -  Kx' ||2) 

<_XfD2,K+>«||„+1_x.||, 

On  the  other  hand ,  noting  that  at/(Ttpt)  =  t2/(pN),  by  (2.27)  in  Lemma  2-4  we  have 


and 


t202  N 

*"2^  ^  ^  Y  <  ^D2Yl  \/y€Y,Vt<N. 


Bt(y,y[t+i],p[t]  )  <  pPy  -  \\yt+i-y*\\  )  <  <  ypdy 

Applying  all  the  above  inequalities  to  (2.13)  in  Lemma  2.3 ,  we  conclude 


XpN , 


(1  -  X)PN 


-Q{w%x*,y,zat3+1)  <  LgD2,  +  ^L\\K\\2D2x,  +  g 


N 

Dx*,k  + 

2  p 


Y ) 


XPN, 


(1  -  x)pN 


N 


<  LGD2x  +  \\K\\2D2x  +  D2xk  +  -Dfi. 


Setting  t  =  N  —  1,  and  applying  (2.35),  (2.30)  and  the  above  inequality  to  Proposition  2.2,  we  obtain  (2.37). 

□ 


Table  2.1:  Rates  of  convergence  of  instances  of  AADMM  for  solving  UCO  with  bounded  feasible  set 


No  preconditioning  (x  =  0)  Preconditioned  (x  =  1) 


ADMM 


O 


^ Dx,kDy ^ 


(\\K\\DxDy^ 


Linearized  ADMM 


O 


LGD\ 

t 


Dx,kL>y  ^ 


O 


LGD\ 

t 


\\K\\DxDy^ 


Accelerated 


^LGD2X  |  Dx.kDy^  c  ^ LGD\  +  \\K\\DxDy^ 


2.4.  Convergence  results  on  solving  AECCO  problems.  In  this  section,  we  study  the  rate  of 
convergence  of  AADMM  for  solving  general  AECCO  problems  without  boundedness  assumption  for  either  X 
or  Y,  in  terms  of  both  primal  and  feasibility  residuals.  We  start  with  the  convergence  analysis  of  ADMM 
algorithms  as  a  special  case  of  AADMM  where  at  =  1,  6t  =  Tt  =  pt  =  P- 

Theorem  2.8.  In  AADMM,  if  at  =  1,  9t  =  rt  =  pt  =  p  and  r)t  =  r]>  LG  +  xp||A'||2,  then 

G(xt+1)  +  F(wt+1)  ~r<Yt  W'  +  P^1  -  x)D1-,k)  (2-38) 
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and 


o  /  2Dz 

|| Bwt+1  -  Kxt+1  -  fe||2  <  ^  | 


r/Dl* 

P 


+  (1  -  X)DI. 


(2.39) 


t  + 1  t+1 

where  xt+1  :=  -  Xi  and  wt+1  :=  -  Wi.  Specially,  if  p  =  1  and  y  =  Lq  +  X||Ar||2,  tften 


and 


G{x^)  +  F>t+1)  -  /*  <  ^(LGjD2.  +  XIIAII^2,  +  (1  -  xWx.iK) 

2 \[LqDx*  XV^IIATUD^.  (1  ~  x)V%Dx*,k  ,  2Dy 


|| Bwt+1  -  AV+1  -  6||  < 

Proof.  Similar  as  the  proof  of  Theorem  2. 5,  we  have 
Q(w*,x*,y;zt+1 ) 


< 


1 


1 


LeDl,  +  xpII^II2^.  +  (1  ^  X)PD2X%K  +  -(\\Vl  -  yf  -  ||yt+i  -  y||2) 
It  |_  ’  p 

<  Yt  [ lgdI *  +xp||A1|2D2.  +  (l  -  x)pD2x%K\  -  (yt( yi  -  yt+i),y), 


where  zt+1  =  J2t= \zi-  Noting  that  Q(z* ,  zt+1)  >  0,  by  (2.42)  we  have 

II yt+i  -  y*f  <  pLgDI .  +  x^llA'H2^,  +  (1  -  x)p2d2x^k  +  d2v„ 

hence  if  we  let  vt+i  =  {y\  —  yt+i)/(pt),  then  we  have 

2 


H+iir  < 


p2t2 


n^y*\\2  +  \\yt+1-y*\\2) 


<  72  ( - —  +  X||-A||2-D2.  +  (1  —  X)D2,  K  ■+  7-Dy*)- 

tz  p  ’  pz  u 


Furthermore,  by  (2.43)  we  have 


g{vt+i,zt+1)  <  —  [LgD2x*  +  XP||AT||2£)2.  +  (1  -  X)pD2x,K]  . 


(2.40) 


(2.41) 


(2.42) 

(2.43) 


Applying  the  two  inequalities  above  to  Proposition  2.1  we  obtain  (2.38)  and  (2.39).  The  results  in  (2.45)  and 
(2.46)  then  follows  immediately.  □ 

From  Theorem  2.8  we  see  that  the  for  ADMM  algorithms,  the  rate  of  convergence  of  both  primal  and 
feasibility  residuals  are  of  order  0(l/t).  The  detailed  rate  of  convergence  of  each  algorithm  is  listed  in  Tables 
2.2  and  2.3.  We  observe  that  a  larger  value  of  p  will  increase  the  right  side  of  (2.38),  but  decrease  that 
of  (2.39).  Hence,  an  “optimal”  selection  of  p  will  be  determined  by  considering  both  primal  and  feasibility 
residuals  together.  For  the  sake  of  simplicity,  we  set  p  =  1. 

In  Theorem  2.9  below,  we  show  that  there  exists  a  weighting  sequence  {at}t>i  that  improves  the  rate  of 
convergence  of  Algorithm  2  in  terms  of  its  dependence  on  Lq. 

Theorem  2.9.  In  AADMM,  if  the  total  number  of  iterations  is  set  to  N ,  and  the  parameters  are  set  to 


at 


2  n  N  1  ,1 

J-,  et=n  =  pt  =  — ,  and  r]t 


2LG  +  X1V||^||2 


(2.44) 
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then 


G(xaN°)  +  F(waN3)-f*< 


1 


2  LgDl.  _ 

N(N-l)  2(N-1) 


[x\\k\\2D2x*  +  (1  -  x)dI*,k\  , 


(2.45) 


and 


11^  -  K,‘‘  -  h  <  +  +  ,  *>, 


(Ar-i)VF  n-i  '  n-  l 

Proof.  Using  equations  (2.44),  (2.35)  and  (2.14),  we  can  calculate  that 
Vt  -  LGat  -  x0t||A"||2  >0,  rt>  pt  for  all  t  <  N , 

Bt(x  ,x[t+1],ri[t])  =  - - - (Dx.  -  ||xt+i  -  x  ||  ) 


N  —  1 


N 


Bt(y,y[t+i],p[t])  = -(\\yi-y\\  -\\yt+i-y\\  ),Vyey, 

Bt(Bw*,Bw[t+1],9[t])  =  ydl BWl  -  Bw* II2  -  ||Bt+1  -  Awl2)  <  yD2,>B  =  yl?2.,*, 

-xBt(A-x*,if*[t+1],0[t])  =  -  fiWKx,  -  iCc*||2  -  || Kxt+1  -  A'x*||2), 

<  -^(Dx.iK-\\Kf\\xt+1-x*f). 

Applying  all  the  above  calculations  to  (2.13)  in  Lemma  2.3,  we  have 


(2.46) 


—  Q(w*,x*,j/;z“®1) 

1  t 


<  LGDl 


xN  I,  „,|2  n2  ,  (1-X)^n2 


■\\K\m 


N 


DZ-,K  +  T(\\vi-v\\-\\yt+i-v\n  vy^y- 


2  2  x  2 
Two  consequences  to  the  above  estimation  can  be  derived.  Firstly,  since  Q(z*;z^?1)  >  0,  we  have 

bt+i  -  y*\\2  <  ^ Dl ,  +  x\\K\\2Dl,  +  (i  -  x)r%-,K  +  D2y„ 


and 

hi  -  yt+i\\2  <  2(112/1  -  y*\\2  +  ht+1  -  V*  l|2)  <  ^Dl,  +  2x||  A'||2H2»  +  2(1  -  X)D2x%K  +  4D2,. 
Secondly,  since  \\yi  ~  2/l|2  -  ||2/t+i  -  2/||2  =  Il2/i||2  -  Il2/t+i||2  -  2(2/1  -  2/t+i,  2/>  <  -2 (2/1  -  yt+i,y), 
i-Q(«/*,  **,*,;*&)  +  iV(2/!  -2/t+i,2/>  <  AGZ42,  +  ^||iX||2142,  +  (1  ~  x)%2,^,  V//  e 


Letting  t  =  N  —  l  and  Vn  '■=  2(j/i  —  yt+i)/(N  —  1),  and  applying  (2.35)  and  the  two  above  inequalities  to 
Proposition  2.1,  we  obtain  (2.45)  and  (2.46).  □ 

Comparing  (2.40)  and  (2.41)  with  (2.45)  and  (2.46)  respectively,  AL-ADMM  and  ALP-ADMM  are  better 
than  both  L-ADMM  and  LP-ADMM  respectively,  in  terms  of  their  rates  of  convergence  of  both  primal  and 
feasibility  residuals.  The  rates  of  convergence  of  AADMM  algorithms  are  outlined  in  Tables  2.2  and  2.3. 
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Table  2.2:  Rates  of  convergence  of  the  primal  residuals  of  AADMM  instances  for  solving  general  AECCO 


No  preconditioning  (x  =  0)  Preconditioned  (x  =  1) 


ADMM 


O 


Linearized  ADMM 


O 


^ LGD2X.+Dl 


,K 


N 


(LgD2x.  +\\K\\Dl.^ 


Accelerated 


O 


lgd2x . 

N2 


( LGD\ *  \\K\\Dl»\ 
\N2  N  ) 


Table  2.3:  Rates  of  convergence  of  the  feasibility  residuals  of  AADMM  instances  for  solving  general  AECCO 


No  preconditioning  (x  =  0)  Preconditioned  (y  =  1) 


ADMM 


O 


D 


Linearized  ADMM  O 


X*  ,K  A  Dy 

~N 


^  V LqDx *  +  Dx*  K  +  Dy 


N 


r,  \K\\DX*+Dy* 

1  N 


/  yJLgDx*  +  \\K\\DX.  +  Dy- 
\  N 


Accelerated 


^  /  VLgDx*  Dx*}K  +  Dy* 


\  N3/ 2 


N 


O 


(VL^DX*  \\K\\DX*+Dy 
V  7V3/2  + 


N 


2.5.  A  simple  backtracking  scheme.  We  have  discussed  the  rate  of  convergence  of  Algorithm  2,  with 
the  assumption  that  both  Lq  and  ||A||  are  given.  In  practice,  we  may  need  backtracking  techniques  to 
estimate  both  constants.  In  this  subsection,  we  propose  a  simple  backtracking  technique  for  AL-ADMM  and 
ALP-ADMM. 

From  the  proof  of  Lemma  2.3,  we  can  see  that  if  Lq  and  || K\\  in  (2.15)  and  (2.24)  are  replaced  by  Lt  and 
Mt  respectively,  i.e., 

G(x^)  <  G{x?d)  +  (\7G(x?d),xatl i  -  x?d)  +  -  x?d\\2  and  (2.47) 

x\\K(xt  -  xt+i)||  <  XMt\\xt  ~  xt+i\\,  (2.48) 

then  Lemma  2.3  still  holds.  On  the  other  hand,  to  prove  Theorems  2.5  through  2.9,  in  addition  to  Lemma 
2.3,  we  require  monotonicity  of  the  sequences  a^r/^/T^,  a[t\T[t}/^[t\-,  a[t]^[t\/^t  and  at/(Ttpt),  and 

Vt  -  Ltat  -  xOtM2  >  0,  (2.49) 

The  monotonicity  of  these  sequences  is  also  used  in  Lemma  2.4,  which  helps  to  prove  the  boundedness  of 
distances  at  the  RHS  of  (2.13)  in  Lemma  2.3.  From  these  observations,  we  can  simply  use  the 

following  choice  of  parameters: 

vtUt  at  2 

0t  =  Tt  =  pt  =  — =r,  m  =  Ltat  +  x“tMt , 

Pt  vtTt 

where  we  assume  that  ZAti,  Mm  are  both  monotone.  It  should  be  noted  that  the  monotonicity  of  CK[t] ^[tl /F [*] 
relies  on  {Lta^ which  is  trivial  if  we  simply  set  Lta2  =  Tt.  In  addition,  in  view  of  the  RHS  of  (2.13), 
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we  require  rt  >  pt,  i.e. ,  vt  >  at/Tt.  We  summarize  all  the  discussions  above  to  a  simple  backtracking  procedure 
below. 


Procedure  1  Backtracking  procedure  for  AL-ADMM  and  ALP-ADMM  at  the  i-th  iteration 
0:  procedure  Backtracking^*-!,  M*_ 1,  r*_i,  xt,  xat9 ,  Lmin) 

1:  Lt  <—  max{Lmjn, Lt_i/2},  Mt  =  Mt- 1  and  u*  =  i>*_i.  >  Initialization 

2:  Estimate  a*  £  [0, 1]  by  solving  the  quadratic  equation 

Lta?=rt_1(l-at)>  (2.50) 

and  set  r*  •£-  F*_i(l  -  a*),  vt  =  max{yt-i, a*/rt}. 

3:  Choose  stepsize  parameters  as 

0t=Tt  =  Pl't^t ,  pt  =  and  rit  =  —  +  xfoMf,  (2.51) 

at  I  ti't  at 

and  calculate  iterates  (2.1)  -  (2.3). 

4:  if  G(xat9+1)  -  G(x™d)  -  (VG(x7id),x“®1  -  xtmd)  >  ^IK+i  -  z™d||2  then 

5:  Set  L*  £-  2L*.  Go  to  2.  >  Backtracking  Lq 

6:  else  if  x\\Kxt+i  -  Kxt ||  >  xMt||xt+i  -  xt||  then 

7:  Set  AIt  2 Mt.  Go  to  2.  >  Backtracking  ||A"|| 

8:  end  if 

9:  return  Lt,  Mt,  rt,  ut,  xt+i,  x“®1;  rt,  pt,  at 

10:  end  procedure 


A  few  remarks  are  in  place  for  the  above  backtracking  procedure.  Firstly,  steps  2  through  8  are  the 
backtracking  steps,  which  terminates  only  when  the  conditions  in  steps  4  and  6  are  both  satisfied.  Clearly, 
in  each  call  to  the  backtracking  procedure,  steps  4  and  6  will  only  be  performed  finitely  many  times,  and 
the  returned  values  Lt  and  Mt  satisfies  Lmin  <  Lt  <  2 Lq  and  Mt  <  2||AT||,  respectively.  Secondly,  while 
Alt  >  Alt- 1  and  vt  >  vt-i,  the  value  of  Lt  in  step  9  is  not  necessarily  greater  than  Lt- 1.  Finally,  the 
multiplier  for  increasing  or  decreasing  Lt  and  Alt  is  2,  which  can  be  replaced  by  any  number  that  is  greater 
than  1. 

The  scheme  of  AADMM  with  backtracking  is  presented  in  Algorithm  3. 


Algorithm  3  AADMM  with  backtracking 


Choose  xi  £  X  and  w\  £  W  such  that  Bwi  =  Kx i  +b,  Lq  >  Lmin  >  0  and  Mo,  z^o,  p  >  0.  Set  x“9  t—  Xi, 
wi,  yl9  t-  yi  =  0,  T0  £-  L0,  t<-  1. 
for  t  =  1,  •  •  •  ,  N  —  1  do 

(Lt,  AIt,  rt,  vt,  Xt+i,  x“9i,  rt,  pt,  ott)<—  BACKTRACKING(Lt_i,  Mt_1;  rt_!,  vt-\,  xt,  x“9,  Lmin ) 

Calculate  iterates  (2.4)  -  (2.7). 

end  for 


We  start  by  considering  UCO  problems  with  bounded  feasible  sets  X  and  Y.  Theorem  2.11  below  sum¬ 
marizes  the  convergence  properties  of  Algorithm  3  for  solving  bounded  UCO  problems. 

Theorem  2.10.  If  we  set  vq  =  —  oo  and  apply  Algorithm  3  to  the  UCO  problem  (1.4)  under  assumption 
(2.28),  then 


G(xat9+1)  +  F(Kxat9+l)-r 


^  4LgD2x 
~  t2  + 


4  Lq 


Lmin  (t  1 ) 


6XP max{4Mg ,  \\K\\2}D2X  +  (1  -  x)pD2x,K  + 


(2.52) 
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171  PartiCUlar’  i fP=  VEXmax{2M0,\\Kl\}Dx  +  (l+x)Dx,K’  th6n 
G(xt+i)  +  F(Kx% i)  -  /* 


< 


4  LgD\ 


4 Lq 


v/6xmax{2 M0,  \\K\\}DXDY  +  (1  -  x)Dx,kDy 


t2  Lminif  1)  L 

Proof.  As  discussed  after  Procedure  1,  we  have 

Lrmn  <  Lt  <  2 LG  and  0  <  Mt  <  2\\K\\. 

We  can  now  estimate  the  bounds  of  at  andTt .  By  (2.12)  we  have  l/rt  =  +  at/Tt,  hence 

r~T  i/rt  -  i/rt_!  _  at/rt 

Vr*- 1  VWt  +  VW^i  VWt  +  VVff' 

Observing  from  equations  (2.12),  (2.50)  and  (2.54)  that 

1/(2 La)  <  a2t/rt  <  1/Lmin, 


(2.53) 


(2.54) 


(2.55) 


have 


1 

T 

f 


1  >  ott/Tt 


at 


> 


1 


r*-i  2y/TJTt  2VT7  2y/2Lc 

1 


and 


1 


< 


Vfif 


< 


rt_i  yiTr/  +  v/ITTvT  V1 

Therefore,  by  induction  we  conclude  that 

t  1 


t 

2v/2Lg 


<4  -  < 


\/  hr, 


< 


1  +  1 


VT0  i JL  r 


Lmin  ^  ^  8  Lg 

Or  — - 7T7T  <  1  J  < 


(*  +  l)2 


t2 


(2.56) 


Now  let  us  examine  the  RHS  of  (2.13)  in  Lemma  2.3.  Without  loss  of  generality,  we  assume  that  2Mq  <  ||A'||. 
Indeed,  if  2Mq  >  ||Jsr||;  then  Mt  =  2Mq  for  all  t  >  1.  Since  vm  and  M[t ]  are  monotonically  increasing,  by 
(2.51)  and  (2.27)  in  Lemma  2.f,  we  have 


Bt(x*,X[t+ i],V[t])  < 


1  +  xvtpM2 


(D2x-  ||xt+1-**||2)< 


l  +  xvtpM; 


D 


X 


<  -D\  +  2Xvtp\\K\\2D\, 
vt 


Bt{y,y[t+i\,p[t{)=  Yp{\\yr~y\\2 -ht+i-yf)  <  -^dy> 

Bt(w* ,w[t+1],0[t])  =  ^(IK-t^ll2  -  |K+i  -  w*||2)  <  - yDx,K ■ 
On  the  other  hand,  by  (2.26)  in  Lemma  2.f  we  have 


-xBt{Kx*,Kx[t+ i],0[t])<  - 


|  Kxx  -  Kx*f  -  ||  Kxt+i  ~  Kx*  ||2)  < 


-||A||2||*t+1-x*||2 


<  -ff-\\K\\2Dx- 


Applying  the  above  calculations  on  B  (•,■,•)  to  Lemma  2.3,  we  have 


^WKPDI  + 


< 


D 


x 


3x"tp\\ K\\2D2x 


(1  -  X)vtp 


n*  a-  —  n 2 
UX,K  +  o  UY- 
2  p 
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Observe  that  by  (2.55)  and  (2.56),  at/Tt  <  (t  +  1)/Lmin,  and  that 

vt  <  max  cti/Yi  <  {t  +  1)/Lr 

Using  the  previous  two  inequalities  and  (2.56),  we  have 


(2.57) 


-  ,  ag  w  4 LgD2x  24 xpLc(t  +  1) 

m~t+ 1)  < 


t2 


t2L„ 


<  ALgDx  +  2AXPLG  \\k\\2D2 


t 2 


II A1 

4TG 

LjninPif  1 ) 


2n2  ,  4(1  —  x)pLG(t  +  1)  n2  |  4LG(f  +  l)n2 

- UX,K  - 


/  LminP 


Dl. 


The  above  inequality,  in  view  of  Proposition  2.2,  then  implies  (2.52)  and  (2.53).  □ 

For  AECCO  problems  when  both  X  and  Y  are  bounded,  we  can  also  apply  Algorithm  3  with  x  =  0, 
as  long  as  the  maximum  number  of  iterations  N  is  given.  Theorem  2.11  below  describes  the  convergence 
properties  of  AL-ADMM  with  backtracking  for  solving  general  AECCO  problems. 

Theorem  2.11.  If  we  choose  x  =  0,  p  =  1,  and  vq  =  N/Lmin  in  Algorithm  3,  then 


G(xaN9)  +  F(waNg)  -  /*  <  ALcDk,  +  kLaDS'*.,  and 


II Bw%  -  Kxag  -  b||  < 


(IV-  l)2  Lmin{N-iy 
1  Q\/LGDX-  16\/2y/ LcDx*tK  32LGDy » 


V L~(N  -  1)3/2  -  1) 


>(N-iy 


(2.58) 

(2.59) 


Proof.  In  view  of  step  2  in  Procedure  1,  equation  (2.57)  and  the  choice  ofvo,  we  can  see  that  =  N/Lr 
By  (2.12),  (2.50),  (2.14)  and  (2.51),  we  have 

Bt(x*,x[t+ i],?7[t])  =  ^(£>2,  -  ||xt+i  -a;*||2)  <  ^H2., 

N 


Bt(y,y[t+i],p[t])=  2T  (\\yi-y\\  -\\yt+i-y\\  ),  Vy^y, 


N 


\BWl  -  Bw*\\2  -  ||Bt+1  -  Bw*\\2)  < 


N 


-D 


x* , K  ■ 


or  A  11  11  11  /  -  or 

*J±Jmin  Lj±Jr\ _ 

Using  the  fact  that  rt  >  pt  and  x  =  0,  and  applying  the  above  calculations  to  Lemma  2.3,  we  have 

N 


kQ(w*’x*,y,z?Zi)  <  \d2x ,  + 

Similarly  to  the  proof  of  Theorem  2.9,  we  have 
L 


,K 


2  Ln 


n-y II  ^  ht+i  -y II  ),  Vy  e  y. 


2L„ 


\\yt+i  -  y*W  <  -j^Di,+Di,,K  +  D2„  \\yi  —  yt+i  ||2  <  +  2D^K  +  AD2y„  and 


kQ{w*  ,x*  ,y\ztU)  +  t 

J-  t  ■‘-'min 


N  (yi-yt+i,y)<  \d2x,  +  -^—d2x.<k,  Vy  e  y. 


Setting  vt+\  =  TtN(yi  —  yt+i)/ Lmin,  t  =  N  —  1  and  applying  (2.56),  we  have 

4LgD2x,  4  LgD\^k 


Q{w*,x*,y;z%9)  +  { vN,y )  < 

IMI  < 


NJ  .  w,„,  .=  (Af  l)2  '  Lmin(N  —  1) ’ 

8V2 VLgNDx*  8y/2Ny/T^Dx.,K  ,  16NLGDy 


VL~(N-1)2  y{L—{N^l)2  Lmin(N- 1)2 


< 


16  \/LcDx, 


+ 


16\/2y/LGDx*yK  32  LGDy* 


(2.60) 

(2.61) 


VL~(N  -  1)3/2  VL~(N-1)  Lmin(N  —  1)  ’ 

These  previous  two  relations  together  with  Proposition  2.1  then  imply  (2.58)  and  (2.59).  □ 
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3.  Numerical  examples.  In  this  section,  we  will  present  some  preliminary  numerical  results  of  the 
proposed  methods.  The  numerical  experiments  are  carried  out  on  overlapped  LASSO,  compressive  sensing, 
and  an  application  on  partially  parallel  image  reconstruction.  All  algorithms  are  implemented  in  MATLAB 
2013b  on  a  Dell  Precision  T1700  computer  with  3.4  GHz  Intel  i7  processor. 

3.1.  Group  LASSO  with  overlap.  The  goal  of  this  section  is  to  examine  the  effectiveness  of  the 
proposed  methods  for  solving  UCO  problems  with  unbounded  X.  In  this  experiment,  our  problem  of  interest 
is  the  group  LASSO  model  given  by  [27] 


min  ^2({aux)  -  fij2  +  ll%ll,  (3-1) 

i—1  geQ 

where  {(a,;,  /i)}’U1  C  Rn  x  R  is  a  group  of  datasets,  x  is  the  sparse  feature  to  be  extracted,  and  the  structure 
of  x  is  represented  by  group  Q.  In  particular,  Q  C  and  for  any  g  C  {1, . . .  ,n},  xg  is  a  vector  that  is 

constructed  by  components  of  x  whose  indices  are  in  g ,  i.e. ,  xg  :=  (£i)ieg.  The  first  term  in  (3.1)  describes  the 
fidelity  of  data  observation,  and  the  second  term  is  the  regularization  term  to  enforce  certain  group  sparsity.  In 
particular,  we  assume  that  x  is  sparse  in  the  group- wise  fashion,  i.e.,  for  any  g  £  Q1  xg  is  sparse.  Problem  (3.1) 
can  be  formulated  as  a  UCO  problem  (1.4)  by  defining  the  linear  operator  K  as  Kx  =  A(xJ1,xJ2, . . . ,  £^)T, 
where  gi  £  Q  and  Q  =  {ffi}*=1.  Specially,  if  each  gi  consists  k  elements,  then  (3.1)  becomes 

m™A\\Ax~  f\\2  +  X\\Kx\\k,ii  (3-2) 

x£Kn  Z 


where  A  =  (alt . . . ,  am)T,  f  =  (/i,  ■  •  • ,  fm)T,  and  ||  •  ||fe,i  is  defined  by  ||u||M  :=  Yh=i  \\iu(ki  fc+1),  •  •  • ,  u(fei))T|| 
for  all  u  £  Rfcn,  where  ||  •  ||  is  the  Euclidean  norm  in  Rfc.  Note  that  F(-)  :=  ||  •  || is  simple,  so  the  solution 
of  problem  (1.2)  can  be  obtained  directly  by  examining  the  optimality  condition,  which  is  also  known  as 
soft-thresholding . 

In  this  experiment,  we  generate  the  datasets  {{cii,  fi)}%L l  by  fi  =  (di,  Xtrue}  +  £,  where  ~  A(0,/n), 
e  ~  N( 0, 0.01),  and  the  true  feature  XtrUe  is  the  n- vector  form  of  a  64  x  64  two-dimensional  signal  whose  support 
and  intensities  are  shown  in  Figure  1.  Within  its  support,  the  intensities  of  Xtrue  are  generated  independently 
from  standard  normal  distribution.  We  set  n  =  4096,  to  =  2048  and  choose  Q  to  be  all  the  2x2  blocks  in 
the  64  x  64  domain  (so  that  k  =  4),  and  apply  L-ADMM,  LP-ADMM,  AL-ADMM  and  ALP-ADMM  to  solve 
(3.1)  in  which  A  =  1.  The  parameters  for  AL-ADMM  and  ALP-ADMM  are  chosen  as  in  Theorem  2.9,  and  N 
is  set  to  300.  To  have  a  fair  comparison,  we  use  the  same  Lipschitz  constants  Lq  =  \max[AT A)  k,  1.6  x  104, 
||  Af||  =  2  and  p  =  0.5  for  all  algorithms  without  performing  a  backtracking.  Both  the  primal  objective  function 
value  fix)  and  the  feature  extraction  relative  error  r{x)  at  approximate  solution  x  £  R  versus  CPU  time  are 
reported  in  Figure  1,  where 


\\x  3^rue|| 

1 1  rue  || 


(3.3) 


From  Figure  1  we  can  see  that  the  performance  of  AL-ADMM  and  ALP-ADMM  are  almost  the  same, 
and  both  of  them  outperforms  L-ADMM  and  LP-ADMM.  This  is  consistent  with  our  theoretical  observations 
that  AL-ADMM  and  ALP-ADMM  have  better  rate  of  convergence  (2.45)  than  ADMM  (2.40). 

3.2.  Compressive  sensing.  In  this  subsection,  we  present  the  experimental  results  on  the  comparison 
of  ADMM  and  AADMM  for  solving  the  following  image  reconstruction  problem: 

mini||Ax-/||2  +  A||Da:||2>i,  (3.4) 

x£X  Z 


where  x  is  the  n-vector  form  of  a  two-dimensional  image  to  be  reconstructed,  || 2,1  is  the  discrete  form 
of  the  TV  semi-norm,  A  is  a  given  acquisition  matrix  (depending  on  the  physics  of  the  data  acquisition),  / 
represents  the  observed  data,  and  X  :=  {x  £  R™  :  /*  <  x^  <  u*,Vi  =  1, . . .  ,n}.  Problem  (3.4)  is  a  special 
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case  of  UCO  (1.4)  with  W  =  R2ra,  G(x)  =  \\Ax  —  6||2/2,  F(w)  =  |Mh,i  and  K  =  A D.  We  assume  that  the 
finite  difference  operator  D  satisfies  the  periodic  boundary  condition,  so  that  the  problem  in  (2.2)  with  \  —  0 
can  be  solved  easily  by  utilizing  the  Fourier  transform  (see  [49]). 

In  our  experiment,  we  consider  two  instances  where  the  acquisition  matrix  A  £  Rmxn  is  generated  inde¬ 
pendently  from  a  normal  distribution  7V(0, l/y/rn)  and  a  Bernoulli  distribution  that  takes  equal  probability  for 
the  values  1  /  y/m  and  —  1  /yfrn  respectively.  Both  types  of  acquisition  matrices  are  widely  used  in  compressive 
sensing  (see,  e.g.,  [2]).  For  a  given  A ,  the  measurements  b  are  generated  by  b  =  Axtrue  +  £,  where  Xtrue  is 
a  64  by  64  Shepp-Logan  phantom  [47]  with  intensities  in  [0,1]  (so  n  =  4096),  and  e  =  _/V(0, 0.OOl/™).  We 
choose  to  =  1229  so  that  the  compression  ratio  is  about  30%,  and  set  A  =  10~3  in  (1.6).  Considering  the 
range  of  intensities  of  a we  apply  ALP-ADMM  with  parameters  in  Theorem  2.6  and  LP-ADMM  to  solve 
(3.4)  with  bounded  feasible  set  I  :=  [i  £  I"  :  0  <  <  1  ,  Vi  =  1, . . .  ,n}.  It  should  be  pointed  that  since 

Y  :=  dom F*  =  {y  £  R2n  :  ||3/||2,oo  :=  maxi=ir..iTl  || (y2*-1,  y2l)Th  <  1},  we  have  Dx  =  DY  =  n,  which 
suggests  that  p  =  l/||-£sf||  may  be  a  good  choice  for  p.  We  also  apply  L-ADMM  and  AL-ADMM  to  solve  (3.4), 
with  x  =  1  and  X  =  Rn.  In  this  case  we  use  the  parameters  in  Theorem  2.9  with  N  =  300  for  AL-ADMM.  To 
have  a  fair  comparison,  we  use  the  same  constants  Lq  =  A max(ATA)  and  ||AT||  =  AyB  (see  [9])  and  p  =  1/||AT|| 
for  all  algorithms  without  performing  backtracking.  We  report  both  the  primal  objective  function  value  and 
the  reconstruction  relative  error  (3.3)  versus  CPU  time  in  Figure  3. 

It  is  evident  from  Figure  2  that  AL-ADMM  and  ALP-ADMM  outperforms  L-ADMM  and  LP-ADMM  in 
solving  (3.1).  This  is  consistent  with  our  theoretical  results  in  Corollaries  2.5,  2.6,  2.8  and  2.9.  Moreover,  it 
is  interesting  to  observe  that  ALP-ADMM  with  box  constrained  X  outperforms  AL-ADMM  with  X  =  Rn. 
This  suggests  that  the  knowledge  of  the  ground  truth  is  helpful  in  solving  image  reconstruction  problems. 

3.3.  Partially  parallel  imaging.  In  this  section,  we  compare  the  performance  of  AADMM  with  back¬ 
tracking  and  Bregman  operator  splitting  with  variable  stepsize  (BOSVS)  [12],  which  is  a  linearized  ADMM 
method  with  backtracking,  in  reconstruction  of  magnetic  resonance  images  from  partially  parallel  imaging 
(PPI).  In  magnetic  resonance  PPI,  a  set  of  multi-channel  k-space  data  is  acquired  simultaneously  from  ra¬ 
diofrequency  (RF)  coil  arrays.  The  imaging  is  accelerated  by  sampling  a  reduced  number  of  k-space  samples. 
The  image  reconstruction  problem  can  be  modeled  as 

-|  Flcyi 

min  -  ]T  \\MFSjX  -  /,  ||2  +  \\\Dx\\2,u  (3.5) 

x£X  Z  z ' 

3= 1 

where  x  is  the  vector  form  of  a  two-dimensional  image  to  be  reconstructed.  In  (3.5),  nch  is  the  number  of  MR 
sensors,  F  £  Craxn  is  a  2D  discrete  Fourier  transform  matrix,  Sj  £  Craxn  is  the  sensitivity  encoding  map  of 
the  j-th  sensor,  and  M  £  K"xrl  describes  the  scanning  pattern  of  MR  sensors,  and  X  C  C".  In  particular, 
Sj' s  and  M  are  both  diagonal  matrices,  and  their  diagonal  vectors  diag  Sj  £  R"  and  diag  M  £  R™  are  n- vector 
form  of  images  that  have  the  same  dimension  as  the  reconstructed  image.  In  practice,  diag  Sj  describes  the 
sensitivity  of  the  j- th  sensor  at  each  pixel,  and  diag  M  is  a  mask  that  takes  value  ones  at  the  scanned  pixels 
and  zeros  elsewhere.  Figure  4  shows  the  two-dimensional  image  representations  of  {diag  Sj  })Ah1 ,  Xtrue  and 
diag M.  The  PPI  reconstruction  problems  are  described  in  more  details  in  [11].  It  should  be  noted  that  (3.5) 
is  a  special  case  of  (3.4),  and  that  the  percentage  of  nonzero  elements  in  diag  M  describes  the  compression 
ratio  of  PPI  scan.  In  view  of  the  fact  that  ||F'||  =  y/n,  the  Lipschitz  constant  Lq  of  (3.5)  can  be  estimated  by 

7lch  'R'ch  'H'ch 

LG  =  1 1  Sj  /•’ 7  .V/ 2  FSj  1 1  <  nilgai2  =  n||  £  diag  SX.  (3.6) 

l=i  l=i  l=i 

In  this  experiment,  nch  =  8,  and  the  measurements  {fj}Jf{  are  generated  by 

fj  =  M{FSjXtrue  +  e"/V2  +  e}m/V^2),  j  =  l,...,nch 

where  the  noises  are  independently  generated  from  distribution  10 ~4y/nln).  We  generate  four 

instances  of  experiments  where  the  ground  truth  Xtrue  are  the  human  brain  image  (see  Figure  4).  The 
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Table  3.1:  Data  acquisition  information  in  partially  parallel  image  reconstruction. 


Instance 

Dimension  of  Xtrue 

Sampling  trajectory 

Acquisition  rate 

Lg 

la 

n  =  256  x  256 

Cartesian  mask 

18% 

3.34  x  105 

ib 

n  =  256  x  256 

Pseudo  random  mask 

24% 

3.34  x  10b 

2a 

n  =  512  x  512 

Cartesian  mask 

18% 

1.60  x  106 

2b 

n  =  512  x  512 

Pseudo  random  mask 

24% 

1.60  x  10e 

information  of  the  instances  is  listed  in  Table  3.1.  In  particular,  instances  la  and  lb  have  Cartesian  and 
pseudo-random  k-space  sampling  trajectories  respectively  but  share  the  same  sensitivity  map  and  ground 
truth,  and  so  are  instances  2a  and  2b. 

We  first  consider  X  =  Cn,  and  use  AL-ADMM  with  backtracking  to  solve  (3.5).  We  use  the  parameters 
in  Theorem  2.11  with  N  =  400  in  all  PPI  experiments.  We  also  apply  the  BOSVS  method  in  [12] 2  to  solve 
(3.5)  with  X  =  C",  which  is  a  backtracking  linesearch  technique  for  L-ADMM  with  Barzilai-Borwein  stepsize 
[3].  Furthermore,  noticing  that  Xtme  is  in  bounded  feasible  set  X  :=  {x  £  Cn  :  |xW|  <  1  ,  V*  =  1 ,...  ,n},  we 
also  apply  ALP-ADMM  with  backtracking  to  solve  (3.5)  with  aforementioned  bounded  feasible  set  X.  We  set 
the  parameters  to  A  =  10~10n  in  (3.5),  and  choose  L0  =  ||F||2  =  n,  Lmin  =  Lq/  10,  M0  =  ||iL||/10  =  Av^/lO 
for  Algorithm  3  where  Lq  is  listed  in  Table  3.1. 

The  performance  of  AL-ADMM,  ALP-ADMM  and  BOSVS  is  shown  in  Figures  5  and  6,  in  terms  of  both 
the  primal  objective  function  value  and  relative  error  (3.3).  It  is  evident  that  AL-ADMM  and  ALP-ADMM 
outperform  BOSVS  in  terms  of  the  decrement  of  both  primal  objective  value  and  relative  error  to  ground 
truth,  especially  in  the  case  of  using  Cartesian  sampling  trajectory.  Since  the  Cartesian  sampling  trajectory 
in  our  experiments  collects  less  low-frequency  data  (the  center  part  in  the  k-space)  and  has  no  randomness 
in  sampling  (see  Figure  4),  it  makes  harder  to  get  a  good  reconstruction  comparing  with  that  of  the  pseudo¬ 
random  sampling  trajectory.  Our  experimental  results  indicates  that  in  this  case  the  AADMM  is  much  more 
efficient  than  BOSVS  in  reconstruction.  It  is  evident  that  AL-ADMM  and  ALP-ADMM  outperform  BOSVS 
in  terms  of  the  decrement  of  both  primal  objective  value  and  relative  error  to  ground  truth.  This  observation 
is  consistent  with  our  theoretical  result  in  Theorems  2.10  and  2.11. 

4.  Conclusion.  We  present  in  this  paper  the  AADMM  framework  by  incorporating  a  multi-step  accel¬ 
eration  scheme  into  linearized  ADMM.  AADMM  has  better  rates  of  convergence  than  linearized  ADMM  on 
solving  a  class  of  convex  composite  optimization  with  linear  constraints,  in  terms  of  the  Lipschitz  constant  of 
the  smooth  component.  Moreover,  AADMM  can  handle  both  bounded  and  unbounded  feasible  sets,  as  long 
as  a  saddle  point  exists.  For  the  unbounded  case,  the  estimation  for  the  rate  of  convergence  depends  on  the 
distance  from  initial  point  to  the  set  of  saddle  points.  We  also  propose  a  backtracking  scheme  to  improve  the 
practical  performance  of  AADMM.  Our  preliminary  numerical  results  show  that  AADMM  is  promising  for 
solving  large-scale  convex  composition  optimization  with  linear  constraints. 

Acknowledgment.  The  authors  would  like  to  thank  Invivo  Philips,  Gainesville,  FL  for  providing  the 
PPI  brain  scan  datasets. 


REFERENCES 

[1]  A.  Auslender  and  M.  Teboulle.  Interior  gradient  and  proximal  methods  for  convex  and  conic  optimization.  SIAM  Journal 

on  Optimization,  16(3):697— 725,  2006. 

[2]  R.  Baraniuk,  M.  Davenport,  R.  DeVore,  and  M.  Wakin.  A  simple  proof  of  the  restricted  isometry  property  for  random 

matrices.  Constructive  Approximation,  28(3) : 253 — 263,  2008. 

[3]  J.  Barzilai  and  J.  M.  Borwein.  Two-point  step  size  gradient  methods.  IMA  Journal  of  Numerical  Analysis,  8(1):141— 148, 

1988. 


2The  BOSVS  code  is  available  at  http: //people. math. gatech.edu/~xye33/software/BOSVS. zip 


23 


[4]  S.  Becker,  J.  Bobin,  and  E.  Candes.  NESTA:  a  fast  and  accurate  first-order  method  for  sparse  recovery.  SIAM  Journal  on 

Imaging  Sciences,  4(1):  1-39,  2011. 

[5]  D.  P.  Bertsekas.  Constrained  Optimization  and  Lagrange  Multiplier  Methods.  Academic  Press,  1982. 

[6]  D.  P.  Bertsekas.  Nonlinear  programming.  Athena  Scientific,  1999. 

[7]  S.  Boyd,  N.  Parikh,  E.  Chu,  B.  Peleato,  and  J.  Eckstein.  Distributed  optimization  and  statistical  learning  via  the  alternating 

direction  method  of  multipliers.  Foundations  and  Trends®  in  Machine  Learning ,  3(1):  1-122,  2011. 

[8]  R.  S.  Burachik,  A.  N.  Iusem,  and  B.  F.  Svaiter.  Enlargement  of  monotone  operators  with  applications  to  variational 

inequalities.  Set- Valued  Analysis,  5(2):159-180,  1997. 

[9]  A.  Chambolle.  An  algorithm  for  total  variation  minimization  and  applications.  Journal  of  Mathematical  imaging  and  vision, 

20(l):89-97,  2004. 

[10]  A.  Chambolle  and  T.  Pock.  A  first-order  primal-dual  algorithm  for  convex  problems  with  applications  to  imaging.  Journal 

of  Mathematical  Imaging  and  Vision,  40(1):120-145,  2011. 

[11]  Y.  Chen,  W.  Hager,  F.  Huang,  D.  Phan,  X.  Ye,  and  W.  Yin.  Fast  algorithms  for  image  reconstruction  with  application  to 

partially  parallel  MR  imaging.  SIAM  Journal  on  Imaging  Sciences,  5(1) :90— 1 18,  2012. 

[12]  Y.  Chen,  W.  W.  Hager,  M.  Yashtini,  X.  Ye,  and  H.  Zhang.  Bregman  operator  splitting  with  variable  stepsize  for  total 

variation  image  reconstruction.  Computational  Optimization  and  Applications,  54(2):317-342,  2013. 

[13]  Y.  Chen,  G.  Lan,  and  Y.  Ouyang.  Optimal  primal-dual  methods  for  a  class  of  saddle  point  problems.  UCLA  CAM  report 

13-31,  2013. 

[14]  P.  L.  Combettes  and  V.  R.  Wajs.  Signal  recovery  by  proximal  forward-backward  splitting.  Multiscale  Modeling  &  Simulation, 

4(4):1168-1200,  2005. 

[15]  A.  d’Aspremont.  Smooth  optimization  with  approximate  gradient.  SIAM  Journal  on  Optimization,  19(3):  1171-1 183,  2008. 

[16]  J.  Douglas  and  H.  Rachford.  On  the  numerical  solution  of  heat  conduction  problems  in  two  and  three  space  variables. 

Transactions  of  the  American  mathematical  Society,  82(2):421-439,  1956. 

[17]  J.  Eckstein  and  D.  P.  Bertsekas.  On  the  douglasrachford  splitting  method  and  the  proximal  point  algorithm  for  maximal 

monotone  operators.  Mathematical  Programming,  55(1-3) :293-318,  1992. 

[18]  E.  Esser,  X.  Zhang,  and  T.  Chan.  A  general  framework  for  a  class  of  first  order  primal-dual  algorithms  for  convex  optimization 

in  imaging  science.  SIAM  Journal  on  Imaging  Sciences ,  3(4):  1015-1046,  2010. 

[19]  D.  Gabay.  Applications  of  the  method  of  multipliers  to  variational  inequalities.  In  M.  Fortin  and  R.  Glowinski,  editors, 

Augmented  Lagrangian  Methods:  Applications  to  the  Numerical  Solution  of  Boundary-  Value  Problems,  volume  15  of 
Studies  in  Mathematics  and  Its  Applications,  pages  299  -  331.  Elsevier,  1983. 

[20]  D.  Gabay  and  B.  Mercier.  A  dual  algorithm  for  the  solution  of  nonlinear  variational  problems  via  finite  element  approxi¬ 

mation.  Computers  &  Mathematics  with  Applications,  2(1):  17-40,  1976. 

[21]  R.  Glowinski  and  A.  Marroco.  Sur  1’ approximation,  par  elements  finis  d’ordre  un,  et  la  resolution,  par  penalisation- 

dualite  d’une  classe  de  problemes  de  dirichlet  non  lineaires.  ESAIM:  Mathematical  Modelling  and  Numerical  Analysis- 
Modelisation  Mathematique  et  Analyse  Numerique,  9(R2):41-76,  1975. 

[22]  D.  Goldfarb,  S.  Ma,  and  K.  Scheinberg.  Fast  alternating  linearization  methods  for  minimizing  the  sum  of  two  convex 

functions.  Mathematical  Programming,  pages  1-34,  2010. 

[23]  T.  Goldstein,  B.  ODonoghue,  and  S.  Setzer.  Fast  alternating  direction  optimization  methods.  CAM  report,  pages  12-35, 

2012. 

[24]  T.  Goldstein  and  S.  Osher.  The  split  bregman  method  for  11-regularized  problems.  SIAM  Journal  on  Imaging  Sciences, 

2(2):323-343,  2009. 

[25]  B.  He  and  X.  Yuan.  On  the  o(l/n)  convergence  rate  of  the  douglas-rachford  alternating  direction  method.  SIAM  Journal 

on  Numerical  Analysis,  50(2):700-709,  2012. 

[26]  M.  R.  Hestenes.  Multiplier  and  gradient  methods.  Journal  of  optimization  theory  and  applications,  4(5):303-320,  1969. 

[27]  L.  Jacob,  G.  Obozinski,  and  J.-P.  Vert.  Group  lasso  with  overlap  and  graph  lasso.  In  Proceedings  of  the  26th  Annual 

International  Conference  on  Machine  Learning,  pages  433-440.  ACM,  2009. 

[28]  G.  Lan.  Bundle-level  type  methods  uniformly  optimal  for  smooth  and  non-smooth  convex  optimization.  Manuscript, 

Department  of  Industrial  and  Systems  Engineering,  University  of  Florida,  Gainesville,  FL,  2013. 

[29]  G.  Lan,  Z.  Lu,  and  R.  D.  Monteiro.  Primal-dual  first-order  methods  with  0(1/ e)  iteration-complexity  for  cone  programming. 

Mathematical  Programming,  126(1) :1  29,  2011. 

[30]  G.  Lan  and  R.  D.  Monteiro.  Iteration-complexity  of  first-order  augmented  lagrangian  methods  for  convex  programming. 

Manuscript.  School  of  Industrial  and  Systems  Engineering,  Georgia  Institute  of  Technology,  Atlanta  (May,  2009),  2009. 

[31]  G.  Lan  and  R.  D.  Monteiro.  Iteration-complexity  of  first-order  penalty  methods  for  convex  programming.  Mathematical 

Programming,  pages  1-25,  2013. 

[32]  P.-L.  Lions  and  B.  Mercier.  Splitting  algorithms  for  the  sum  of  two  nonlinear  operators.  SIAM  Journal  on  Numerical 

Analysis,  16(6):964-979,  1979. 

[33]  Z.-Q.  Luo.  On  the  linear  convergence  of  the  alternating  direction  method  of  multipliers.  arXiv  preprint  arXiv: 1208.3922, 

2012. 

[34]  R.  D.  Monteiro  and  B.  F.  Svaiter.  Iteration-complexity  of  block-decomposition  algorithms  and  the  alternating  direction 

method  of  multipliers.  SIAM  Journal  on  Optimization,  23(l):475-507,  2013. 

[35]  R.  D.  Monteiro  and  B.  F.  Svaiter.  On  the  complexity  of  the  hybrid  proximal  extragradient  method  for  the  iterates  and  the 

ergodic  mean.  SIAM  Journal  on  Optimization,  20(6):2755-2787,  2010. 

[36]  R.  D.  Monteiro  and  B.  F.  Svaiter.  Complexity  of  variants  of  Tseng’s  modified  F-B  splitting  and  Korpelevich’s  methods 

for  hemivariational  inequalities  with  applications  to  saddle-point  and  convex  optimization  problems.  SIAM  Journal  on 


24 


Optimization ,  21(4):1688-1720,  2011. 

[37]  J.-J.  Moreau.  Decomposition  orthogonale  dun  espace  hilbertien  selon  deux  cones  mutuellement  polaires. (french).  CR  Acad. 

Sci.  Paris ,  255:238-240,  1962. 

[38]  Y.  Nesterov.  Excessive  gap  technique  in  nonsmooth  convex  minimization.  SIAM  Journal  on  Optimization ,  16(l):235-249, 

2005. 

[39]  Y.  E.  Nesterov.  A  method  for  unconstrained  convex  minimization  problem  with  the  rate  of  convergence  0(1/ k2).  Doklady 

AN  SSSR ,  269:543—547,  1983.  translated  as  Soviet  Math.  Docl. 

[40]  Y.  E.  Nesterov.  Smooth  minimization  of  nonsmooth  functions.  Mathematical  Programming ,  103:127-152,  2005. 

[41]  J.  Nocedal  and  S.  J.  Wright.  Numerical  optimization.  Springer  Science+  Business  Media,  2006. 

[42]  H.  Ouyang,  N.  He,  L.  Tran,  and  A.  G.  Gray.  Stochastic  alternating  direction  method  of  multipliers.  In  Proceedings  of  the 

30th  International  Conference  on  Machine  Learning  (ICML-13),  pages  80-88,  2013. 

[43]  J.  Pena.  Nash  equilibria  computation  via  smoothing  techniques.  Optima ,  78:12-13,  2008. 

[44]  M.  J.  D.  Powell.  A  method  for  nonlinear  constraints  in  minimization  problems.  In  Optimization  (Sympos.,  Univ.  Keele, 

Keele,  1968),  pages  283—298.  Academic  Press,  London,  1969. 

[45]  R.  T.  Rockafellar.  Convex  analysis.  Princeton  University  Press  (Princeton,  NJ),  1970. 

[46]  R.  T.  Rockafellar.  Monotone  operators  and  the  proximal  point  algorithm.  SIAM  Journal  on  Control  and  Optimization , 

14(5):877-898,  1976. 

[47]  L.  A.  Shepp  and  B.  F.  Logan.  The  fourier  reconstruction  of  a  head  section.  Nuclear  Science,  IEEE  Transactions  on, 

21(3):21-43,  1974. 

[48]  P.  Tseng.  On  accelerated  proximal  gradient  methods  for  convex- concave  optimization,  submitted  to  SIAM  Journal  on 

Optimization ,  2008. 

[49]  Y.  Wang,  J.  Yang,  W.  Yin,  and  Y.  Zhang.  A  new  alternating  minimization  algorithm  for  total  variation  image  reconstruction. 

SIAM  Journal  on  Imaging  Sciences,  l(3):248-272,  2008. 

[50]  X.  Ye,  Y.  Chen,  and  F.  Huang.  Computational  acceleration  for  MR  image  reconstruction  in  partially  parallel  imaging. 

Medical  Imaging,  IEEE  Transactions  on,  30(5):  1055-1063,  2011. 

[51]  X.  Ye,  Y.  Chen,  W.  Lin,  and  F.  Huang.  Fast  MR  image  reconstruction  for  partially  parallel  imaging  with  arbitrary  k-space 

trajectories.  IEEE  Transactions  on  Medical  Imaging,  30(3):575-585,  2011. 


25 


Fig.  1:  True  feature  xtrue  in  the  experiment  of  group  LASSO  with  overlap.  Left:  the  support  of  x true-  Right:  the  intensities 
of  Xtrue- 


Fig.  2:  Comparisons  of  AL-ADMM,  ALP-ADMM,  L-ADMM  and  LP-ADMM  in  group  LASSO  with  overlap.  Left:  the  objective 
function  values  f(x^B)  from  AL-ADMM  and  ALP-ADMM,  and  f(xt)  from  L-ADMM  and  LP-ADMM  vs.  CPU  time.  The  straight 
line  at  the  bottom  is  f  (xtrue)-  Right:  the  relative  errors  r(i“s)  from  AL-ADMM  and  ALP-ADMM  and  r(xt)  from  L-ADMM 
and  LP-ADMM  vs.  CPU  time. 
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Fig.  3:  Comparisons  of  AL-ADMM,  ALP-ADMM,  L-ADMM  and  LP-ADMM  in  image  reconstruction.  The  top  and  bottom 
rows,  respectively,  show  the  performance  of  these  algorithms  on  the  “Gaussian”  and  “Bernoulli”  instances.  Left:  the  objective 
function  values  /(x“9)  from  AL-ADMM  and  ALP-ADMM,  and  f(xt )  from  L-ADMM  and  LP-ADMM  vs.  CPU  time.  The  straight 
line  at  the  bottom  is  f(xtrue)-  Right:  the  relative  errors  r(xaf  9)  from  AL-ADMM  and  ALP-ADMM,  and  r(xt)  in  L-ADMM  and 
LP-ADMM  vs.  CPU  time. 
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Fig.  4:  Sensitivity  map  {diag Sj}^=1  (left),  ground  truth  xtrue  (middle)  and  mask  diag M  (right)  in  partially  parallel  image 
reconstruction,  (a):  The  sensitivity  maps  in  instances  la  and  lb.  (b):  The  ground  truth  in  instances  la  and  lb.  (c):  The  k-space 
sampling  trajectory  in  instances  la  (top)  and  lb  (bottom),  (d):  The  sensitivity  maps  in  instances  2a  and  2b.  (e):  The  ground 
truth  in  instances  2a  and  2b.  (f):  The  k-space  sampling  trajectory  in  instances  2a  (top)  and  2b  (bottom). 
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Fig.  5:  Comparisons  of  AL-ADMM,  ALP-ADMM  and  BOSVS  in  partially  parallel  image  reconstruction.  From  top  to  bottom: 
performances  of  algorithms  in  instances  la  and  lb.  Left:  the  objective  function  values  vs.  CPU  time.  The  straight  line  at  the 
bottom  is  f  (xtrue) •  Right:  the  relative  errors  vs.  CPU  time. 
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Fig.  5:  Comparisons  of  AL-ADMM,  ALP-ADMM  and  BOSVS  in  partially  parallel  image  reconstruction  (cont’d).  From  top  to 
bottom:  performances  of  algorithms  in  instances  2a  and  2b.  Left:  the  objective  function  values  vs.  CPU  time.  The  straight  line 
at  the  bottom  is  f(xtrue)-  Right:  the  relative  errors  vs.  CPU  time. 
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Fig.  6:  Comparison  of  AL-ADMM,  ALP-ADMM  and  BOSVS  in  partially  parallel  image  reconstruction.  Prom  top  to  bottom: 
Reconstructed  images  and  reconstruction  errors  in  instances  la  and  lb,  respectively.  From  left  to  right:  AL-ADMM,  ALP-ADMM 
and  BOSVS. 
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Fig.  6:  Comparison  of  AL-ADMM,  ALP-ADMM  and  BOSVS  in  partially  parallel  image  reconstruction  (cont’d).  From  top  to 
bottom:  Reconstructed  images  and  reconstruction  errors  in  instances  2a  and  2b,  respectively.  From  left  to  right:  AL-ADMM, 
ALP-ADMM  and  BOSVS. 
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